PromQL to find the mean response time of a downstream service

Question

What would be the correct PromQL to find the mean latency of the downstream calls.

We are using the given promQL to find the p99 of the downstream service: histogram_quantile(0.99, sum(rate(resilience4j_circuitbreaker_calls_seconds_bucket{name="circuitBreakerName"}[1m])) by (le)).

The prometheus metrics which are exposed by application are : resilience4j_circuitbreaker_calls_seconds_count, resilience4j_circuitbreaker_calls_seconds_sum and resilience4j_circuitbreaker_calls_seconds_bucket.

How to write the promQL to find the mean time using above metrics ?

score 0 · Answer 1 · answered Jun 15 '23 at 12:44

0

To count mean you need to divide total value of response time by number of responses. Since those behave like counters, you need to apply rate first.

I believe something as easy as this might work for you:

irate(resilience4j_circuitbreaker_calls_seconds_sum [1m])
 / irate(resilience4j_circuitbreaker_calls_seconds_count [1m])

Remember to adjust range selectors to your suite your situation (at least two times scrape interval).

answered Jun 15 '23 at 12:44

markalex

8,623
2
7
32

Does resilience4j_circuitbreaker_calls_seconds_sum means the total amount of time all request has taken, assuming all request must have taken different time to response. And how does rate will perform differently here, instead of irate ? – tusharRawat Jun 15 '23 at 13:39
@tusharRawat, yes `_sum` is expected to have the total sum of all observed values: official documentation on summaries [here](https://prometheus.io/docs/concepts/metric_types/#summary). Read [rate vs irate](https://stackoverflow.com/a/55628595/21363224). My understanding is that `irate` produces more precise result, and it might be important when it is used in arithmetic. – markalex Jun 15 '23 at 16:30

PromQL to find the mean response time of a downstream service

1 Answers1