318

What does P99 latency represent? I keep hearing about this in discussions about an application's performance but couldn't find a resource online that would talk about this.

yoozer8
  • 7,361
  • 7
  • 58
  • 93
maverik
  • 3,409
  • 3
  • 16
  • 7
  • 2
    Organize all your data points from lowest to highest, from left to right. Gather the lowest (leftmost) 99% of the data points, and discard the remaining 1% to the right. The highest value in this gathered group (the right-most value in this left group) is the P99 value. – John Red May 22 '23 at 19:00

7 Answers7

425

It's 99th percentile. It means that 99% of the requests should be faster than given latency. In other words only 1% of the requests are allowed to be slower.

Chris
  • 11,780
  • 13
  • 48
  • 70
Tomasz Nurkiewicz
  • 334,321
  • 69
  • 703
  • 674
  • 55
    Only 1% of requests are **expected** to be slower. – conmak Oct 08 '21 at 20:34
  • 1
    Besides, it is a straightforward answer and has links to the definition, I prefer the @kanagavelu-sugumar answer, which gives an example and also explains why the P99 will be better than p95 in a context. Just remember to consider the context. – voiski Jul 13 '22 at 14:04
111

Imagine that you are collecting performance data of your service and the below table is the collection of results (the latency values are fictional to illustrate the idea).

Latency    Number of requests
1s         5
2s         5
3s         10
4s         40
5s         20
6s         15
7s         4
8s         1

The P99 latency of your service is 7s. Only 1% of the requests take longer than that. So, if you can decrease the P99 latency of your service, you increase its performance.

tranmq
  • 15,168
  • 3
  • 31
  • 27
  • 2
    Found this as more practical example :) – chaosguru Feb 21 '22 at 13:10
  • 1
    I like this example! It's easier to understand. – nayiaw Mar 11 '22 at 12:14
  • 2
    How/why did we select 7 here ? – Shahbaz Zaidi Mar 13 '22 at 18:30
  • 2
    @ShahbazZaidi You take all your requests and discard 99% of the bottom ones. In this example above, we discard all requests with latency from 1s to 7s. – ThePavolC Mar 28 '22 at 15:56
  • @ShahbazZaidi If I understand ThePavolC's explanation correctly, it's _close_ but not quite right: If we discard the bottom 99% (_including_ `7s`), then we'd be left with `8s` but that is NOT the 99th percentile! Instead, I would explain it the other way around: Sort the requests in ascending order and discard the top/largest 1%. The largest remaining value is the 99th percentile. Here, there are 100 requests so the "top 1%" corresponds to the 1 largest request (the one that took `8s`). When we get rid of that, the max remaining value is `7s`, which is the correct 99th percentile. – Seth Aug 15 '23 at 22:20
  • Source: My explanation above is based off of [this](https://stackoverflow.com/questions/38781499/what-does-90th-95th-99th-pct-matrices-means-in-dashboard-report-of-jmeter/45467589#45467589) answer – Seth Aug 15 '23 at 22:21
93

We can explain it through an analogy, if 100 students are running a race then 99 students should complete the race in "latency" time.

hem
  • 1,012
  • 6
  • 11
rajat1293
  • 1,031
  • 7
  • 5
  • 49
    `Should` not `will`. – Aaron S Mar 08 '18 at 23:37
  • 9
    Also, <= 'latency time' – Core_Dumped Apr 27 '18 at 00:31
  • 21
    It's the time that the student who came in 99th crossed the line. – jarmod Aug 28 '18 at 14:44
  • 4
    I love this analogy. – luii Oct 21 '19 at 16:43
  • What if there are only 50 students? – Tyler Liu Nov 05 '21 at 18:41
  • 1
    @Tyler Lui Then half a student should complete the race in "latency" time – user13758558 Mar 15 '22 at 04:43
  • The modal verb being used is likely confusing the answer a bit. The percentile ranking of the latency of a given set of requests is the measurement of requests that have already occurred 100% (that is all of them) in the past. If your p99 value is 1 ms, that means that 99% of the sample that was taken to come up with the ranking had a latency of 1 ms or less. There is no could, would, or will. That whole nomenclature is used to argue ideas about the past and how likely the prediction of their future performance is true. @AaronS said it best with should. – Simon.Ponder Jul 27 '23 at 13:02
31

Lets take an example from here

Request latency:
    min: 0.1
    max: 7.2
    median: 0.2
    p95: 0.5
    p99: 1.3

So we can say, 99 percent of web requests, the average latency found was 1.3ms (milli seconds/microseconds depends on your system latency measures configured). Like @tranmq said, if we decrease the P99 latency of the service, we can increase its performance.

And it is also worth noting the p95, since may be few requests makes p99 to be more costlier than p95 e.g.) initial requests that builds cache, class objects warm up, threads init, etc. So p95 may be cutting out those 5% worst case scenarios. Still out of that 5%, we dont know percentile of real noise cases Vs worst case inputs.

Finally; we can have roughly 1% noise in our measurements (like network congestions, outages, service degradations), so the p99 latency is a good representative of practically the worst case. And, almost always, our goal is to reduce the p99 latency.

Scott Stensland
  • 26,870
  • 12
  • 93
  • 104
Kanagavelu Sugumar
  • 18,766
  • 20
  • 94
  • 101
12

Explaining P99 it through an analogy: If 100 horses are running in a race, 99 horses should complete the race in less than or equal to "latency" time. Only 1 horse is allowed to finish the race in time higher than "latency" time.

That means if P99 is 10ms, 99 percentile requests should have latency less than or equal to 10ms.

Prakul
  • 240
  • 3
  • 9
1

If p99 value is 1ms, it means, 99 out of 100 requests take less than 1ms, and 1 request take about 1 or more than 1ms.

0

To put it simply, imagine you have an API with a contract stating that it must respond within 10 milliseconds (ms) to callers. Over the course of an hour, you've received various requests from different consumers:

Consumer A made 10 requests at 10:00 am with responses taking 5ms each. Consumer B sent 2 requests at 10:05 am, each with a 5ms response. At 10:07 am, Consumer B submitted 20 requests, each taking 7ms to respond. Again at 10:07 am, Consumer B had 20 more requests with 7ms responses. At 10:20 am, Consumer B requested 20 times, with responses taking 11ms. Consumer B made 30 requests at 10:15 am, with responses at 12ms. At 10:30 am, Consumer B submitted 20 requests, and each took 10ms. Finally, at 10:43 am, Consumer B had 40 requests, with 9ms responses. If we sort these response times in ascending order, the second-highest response time is 11ms, which exceeds the agreed 10ms. This value, known as P99, indicates that 99% of responses were below or equal to 11ms. Since P99 is above the agreed response time, we should also check P95, which examines if 95% of all requests breach the agreed response time. If they do, we must also look into P90. By continuously monitoring these metrics (P90, P95, and P99), the Operations team can swiftly identify issues in the service or infrastructure and take corrective action.

Ami Jha
  • 11
  • 1