Another topic I like to discuss are Longtail Latencies. Consider a system, where each service typically responds in 10ms, but with a 99th percentile latency of one second.
The distribution could look like like in the picture below. The latencies we would expect are green, longtails are orange. So 99th percentile responses could be like 30 times slower and in the case of 99.9th, it could be like 50 times slower. And by the way, the graph also shows us, how important is to study whole latency distribution, not just means. Using means only the problem could be overlooked easily.
The 99th percentile of 30 ms means that 1 request in a hundred experience 30ms latency instead of expected let’s say 1 millisecond. Moreover, most of our systems are distributed microservices so one request can create a bunch of other requests in our system. For example, let’s say one client request generates 10 sub-requests and assume there’s a 1% probability of hitting a slow service. So we got 9.5% ( ~ 1 - 0.99^10 = 0.095 -> 9.5% ) chance, the request is affected by the slow response. Longtails can be caused by various reasons, like hardware or network issues, misconfigurations, architecture or implementation problems and so on. But it’s important to realize, it’s not just a noise or some peaks caused by power users. It’s a real problem.
Tolerating Longtail Latencies
Usually, when are we facing a problem, we try to sort it out, do we? So a general approach to tackle longtails could be following. Firstly, try to narrow the problem as much possible. Distributed systems are usually quite complex so every bit counts. Then we should isolate it in a test environment and start to measure and monitor everything. If everything goes fine, we should be able to identify and fix the problem. But it’s usually not that easy, actually, it’s usually very hard. It could take weeks of investigation with an uncertain result. So if you’re struggling to tackle the problem, there might be a better way.
Let's try to tolerate longtails in a similar way how we tolerate failures. There’s a bunch of various approaches trying to tackle this problem, let's go through the most common:
- Hedging your bet - meaning sending the same request to multiple servers, and use whatever comes back first. Just a very important note to avoid doubling or tripling your load. Do not send the hedging requests immediately but try to wait until the original request has been outstanding for more than the 95th percentile. So additional load should be around 5%, but the longtails should be shortened significantly.
- Tied requests - This one means, instead of delaying before sending out our hedge request, we enqueue requests simultaneously on multiple servers and we also tie them together with information about the others. When the first server process the request it can tell the others to cancel it from their queues. In a real Google system, this reduced median latency by 16% and the 99.9% percentile was reduced by 40%.
- Selectively increase replication factors - This one is pretty obvious, try to have more copies of things which you find more important.
- Put slow machines on probation - We can also try to temporarily exclude the slow machine from our operations. Since the source of problems might be temporary, we can continue to send shadow request to the machine and monitor it. If the problem disappears, we can put it back.
- Consider ‘good enough’ responses - meaning responding with incomplete results in exchange for better end-to-end latency.
- Hardware update - Hardware update can be the fastest and a reasonably cheap way to tackle the problem
That was a quick introduction into a very interesting topic of tolerating longtail latencies. If you found it intresting you should definitely take a look at he article in the further reading section. And as always, if you have any questions or comments, don't hesistate to let us know.