Client-server Action Synchronisation
THIS IS ALL OUT OF DATE NOW. SELECT "DETERMINISTIC LOCKSTEP" IN THE OPTIONS.
"Desync" is a very hot topic. At best it's a minor annoyance when it occurs and at worst it can cause characters to get killed in situations where they thought there were no monsters around. We have many changes coming that will substantially improve the situation, but would like to also explain how our synchronisation systems work in case you're interested, and to make it clear that game state synchronisation is a problem that all online games need to deal with. In this article I'm going to try to clearly explain:
How different types of online games handle latency Any game has calculations that occur to determine the result of actions. In RPGs, these can range from combat calculations (who did what damage) to important economic transactions involving game items. To prevent players cheating, it's important that these calculations are not done on the gamer's computer, because they can easily modify the result of such calculations. Because of this, all calculations that affect someone's progress must be done on servers that we control. These servers exist all over the world (Texas, Amsterdam, Singapore and Australia), but due to the speed of light and other physical limitations, it's not instant to send or receive data from them. We typically see response times between our players and the servers of around 50-250ms. All online games have this situation. The server has to dictate whether things happen or not, but there's a 50-250ms delay before data gets to the server and back. There are three ways that games can solve this:
Action RPGs have to use the third system (action prediction) to feel responsive. The problem is, the second you start moving, you're implicitly out-of-sync by definition. Your client has drawn the first few frames of movement (to be nice and responsive), but the server has no idea you clicked a button yet until the data arrives. Action prediction is mandatory for this type of game but results in you being slightly out-of-sync almost all of the time. This is generally no problem, but once too many predictions get made based on incorrect data, very bad things happen. The challenge is detecting and correcting the situation before this occurs. How our system of action prediction works Let's say you're playing with 200ms round-trip latency and you click a monster that is 2 seconds of travel distance away from you. Assume your attack animation has its contact point after 300ms (which is where damage is dealt). 0ms: You click the monster. Your character starts running towards it on the client. 100ms: Your click arrives at the server. The character there starts running towards the monster also. At this stage your local character is already 5% of the way there. 2000ms: Your character arrives at the monster on the client. It's not there yet on the server. You don't even know if it'll ever arrive for sure (it might get interrupted by an attack still). Your client starts to animate the sword swing: 2100ms: Your character arrives at the monster on the server. The server immediately performs the combat calculation in advance of the contact point and sends the tentative result back to the client. 2200ms: You receive the notification from the server about what type of damage you will deal and roughly how much. Thankfully it arrived before the contact point of the animation! This is not always the case. 2300ms: You hit the contact point on the client. Because you have the damage information in advance, you can draw a pleasing blood splatter, fire effect and so on. This hit has not even occurred yet on the server. 2400ms: You hit the contact point on the server. The damage is locked in and actually applied to the monster. It dies. Experience and item drops are calculated and sent to the client. 2500ms: Your client receives an experience update and the information of what items to show falling to the ground. Despite the fact that your information is delayed by 100ms, it arrived before the contact point and the only indication of playing under latency that the client noticed was the fact that it took a tenth of a second for the item drops to arrive. At no point in that process was any gameplay calculation compromised in a way that would enable players to cheat the system. Why sync problems occur with this system and how they manifest This above example assumes that everything went smoothly. It's entirely possible for the 2 second travel time to be completely different on both ends, or for a lag spike to occur causing the timing to get completely out of sync. If the attack is interrupted on the server before it starts (during movement) but not on the client, then you have a long animation playing that can't be cancelled because the communication time is a decent length of the animation. Even if no strange lag occurs, the monsters that are nearby are pathfinding on the client to where they think you are - which by definition is different than on the server because of latency. These entities have to find paths that go around the other monsters, which of course are in subtlety different positions on both ends. The differing paths further contribute to the monsters being in the wrong place. It's worth stressing that in 99% of combat events, everything feels fine. Although the simulation is out-of-sync due to the speed of data transmission, the timing generally works out and monsters who are following weird paths get to you at roughly the right times and in roughly the right places. It's hard to really know that anything's wrong... except when it's horribly wrong. Unfortunately, when things are very out of sync, players have a pretty bad time. They take damage out of nowhere or find that they're actually trapped between monsters that didn't appear in the right places on their client. We have code to detect these situations and hopefully resync (rubber-band) the entities back into place quickly, but it's often not good enough. Why desync has to exist and why rubber-banding is good The key thing to understand is that Action RPGs have to use an action prediction system like this. If they wait for confirmation of every action from the server then it feels terrible to control. Even if our resyncing code was perfect, there would be situations where the game gets out of sync just because of tiny timing differences. Imagine you're running near a large rock, and you arbitrarily click on the other side of it. Both the client and the server attempt to find the shortest path around the rock. Because your client is ahead of the server by definition (as the movement was processed there approximately 50-250ms earlier, so that it was responsive), there are cases where the client may choose to go a different way around the rock than the server. If you were hit by a monster en-route, then your movement will be interrupted in a different place on both simulations. You are now out of sync. Intelligent resync code would detect this and rubber-band you across the rock back to where you're meant to be. The key observation here is that improved resync code involves more rubberbanding than we have at the moment. If we do it properly, monsters and players will be corrected to better positions more frequently, to prevent anything getting drastically out of place. Many players interpret the rubber-banding itself as "desync", when in reality it's what is fixing the problem as it is detected. It's not going to be easy explaining that the increased rate of rubber-banding is not only good, but also the ideal solution. Why some other games appear to not have similar problems Games using the "wait until server responds" method (RTS and MOBA games) have much higher input latency but don't have the same sync issues that we do. They have their own class of game state synchronisation problems that we thankfully don't have to deal with. Games using client action prediction like ours run into exactly the same sync issues that we do unless they cheat on certain aspects of the simulation. For example, it's common for Action RPGs to do some combination of the following:
Unfortunately, we don't want to do any of those things! They each individually ruin part of the hardcore experience: by allowing combat/movement cheats, preventing accuracy from existing as a mechanic, prevent stunlock, preventing people getting blocked in, etc. Due to the fact that we want to have hardcore game mechanics (i.e. ones where position matters and it's difficult to cheat in PvP), the only option for us has been to put a lot of work into improving our combat simulation and resync code. What we're planning to do to improve synchronisation There are a lot of changes that we're experimenting with that may individually improve the synchonisation of the combat simulation (along with their potential drawbacks):
At this stage it looks like the biggest gains will come from improving the resync code so that it rapidly and reliably resyncs the combat situation if things get too desynchronised. This will mean more rubber-banding (as explained earlier), but will massively reduce deaths that occur from the player not being able to see the true locations of entities. I explained the above changes with their drawbacks because I want to make it clear that this problem is intrinsically difficult to solve. We're fighting against both the laws of physics (travel speed of data) and the desire to not compromise gameplay mechanics. I have full confidence that we will incrementally deploy changes that substantially improve this situation. Update (April 20, 2014): As I post this article to the new Development Manifesto, I couldn't find any way to improve the above explanation - it's still very accurate, and other than a few small edits, I'd like to leave it mostly as-is. There are some points that I'd like to clarify that have come up since it was initially posted:
To elaborate on the last point and clarify the problem in a nutshell: in order to keep hardcore game mechanics like body-blocking, stunning and missing while also preventing players from manipulating combat results, small amounts of desync will occur naturally. There is no way around this, due to the speed of light. An ideal solution from Grinding Gear Games would be to very rapidly detect and correct those sync problems, putting things back where they should be. We have not yet delivered this solution to our satisfaction. Once we have, though, you may notice periodic resyncs, which may initially feel like you're out of sync all the time. That's because the system will be acknowledging it and correcting it, rather than assuming that it's all going to be fine and letting you end up two rooms away, pinned against a wall. I'll update this article as we continue to make progress on this area. Thanks for reading this far - you now know more than you ever wanted to about the pains of networked game state synchronisation. Последняя редакция: Chris. Время: 19 мая 2015 г., 18:30:43
| |
"I'm not sure what you mean by this. The Vaal Oversoul both can and does walk around. | |
"No, I'm afraid it would not, and in several cases it could make desync worse. /oos is not a magic fixing button, and in particular if used during movements it can worsen sync, by disrupting actions on the client that are correct, pushing it further out of sync. The reason why resync-on-stun is a viable and useful feature that improved sync is that during a stun you're definitely not moving, sidestepping this issue. | |
"That is caused by the client(s) and server getting out of sync - albeit with some sort of pathological case that repeats. He clicked to initiate a move action. This into was passed to the server, which passed it to your client. On his client, and on your client, the movement action was able to move, where on the server something stopped it as soon as (or shortly after) he started moving. The fact it repeats shows there was a specific problem with the terrain and/or entities like chests that was not actually pathable on the server, yet both your clients managed to find a path through on them, before being resynced back because the server noticed he'd gotten too far from his server position. I've seen that twice before, once with chests and once with a unique terrain piece. I believe both were fixed up. If you can find some terrain which causes this sort of repeated pathing mismatch then a screenshot with /debug info can help us replicate it locally and work out what needs fixing with that bit of terrain. | |
This thread is locked because we've added Deterministic Lockstep.
|