0

I've read Prometheus how to handle counters on server and I've been digging around on the web, but I still don't see a method for accomplishing what I'm trying to do. Prometheus may not be the best tool for the job, I'm not sure.

Every day, we receive N request packets from customers. We've instrumented a counter that counts the number of packets. I can use rate and increase, those show change over time and are somewhat helpful, but we are really interested in the overall counts, and we want to disregard restarts.

What I would like to see is a graph that starts at 0 and over time shows the number of responses that were seen, never goes down, accounts for resets.

I know the total itself that disregards the resets is available somewhere, since "instant" queries seem able to return that. I have yet to find any query variant though that allows me to perform this graph.

TLDR; I want to see the absolute count over time

EDIT: Alin - when I try your solution over any time range, I see what I was seeing before:

increase-1y increase-5m

Even at a low resolution - I don't really care about precision too much - just want it to be +-100. I just want to see the overall trend without these spikes/decreases.

TopherGopher
  • 655
  • 11
  • 21

2 Answers2

2
increase(my_counter[1000y])

But it's going to be really slow.

Or, you could have a recording rule that forever increments a counter with the increase of the source counter. But you'll have to keep in mind a couple of things:

  1. For some unfathomable reason increase(foo[1m]) is an estimate of the increase over the previous 1 minute rather than an improved version of foo - foo offset 1m (to handle counter resets only).
  2. Your rules won't be evaluated exactly as often as you tell Prometheus to do it and some evaluations may be skipped altogether. So if an increase happens in the minute that the evaluation is skipped (or Prometheus is down) it will be gone forever.

So yeah, as stated in many places Prometheus is not ideal for accounting purposes. It's not going to give you exact values, no matter how hard you try.

Alin Sînpălean
  • 8,774
  • 1
  • 25
  • 29
  • Responded with screenshots – TopherGopher May 10 '19 at 16:58
  • 1
    As I was saying, `increase(foo[1m])` is an estimate of the actual increase over the requested period. More precisely, it takes only the samples that fall within the requested `1m` (so, without `foo offset 1m`) and then compensates for that by extrapolating (as described at length in [Prometheus issue 3806](https://github.com/prometheus/prometheus/issues/3806)). So the first time your counter appears and then immediately jumps to ~5k, that extrapolation exaggerates the increase by quite a bit. – Alin Sînpălean May 11 '19 at 11:47
  • 1
    You can work around that limitation by using a recording rule (and running the chance of missing out on some increases or double-counting them) and using an expression like `clamp_min(foo - foo offset 1m, 0)` (assuming your rule evaluation interval is `1m`) and then graphing the output of `sum_over_time(foo:increase[1000y])` (where `foo:increase` is the recording rule evaluated above). – Alin Sînpălean May 11 '19 at 11:50
-1

This can be done in VictoriaMetrics with running_sum function:

running_sum(increase(foo))

It will show a graph showing the increase of foo over the selected time range. The graph will start from 0 at the beginning of the selected time range and will grow to the maximum value at the end of the selected time range.

If the database contains multiple matching time series for foo selector, then the graph will contain multiple lines. It can be merged to a single line by adding sum to the query:

running_sum(sum(increase(foo)))

The query also relies on an automatic addition of lookbehind window in square brackets for increase function - this is a MetricsQL feature

valyala
  • 11,669
  • 1
  • 59
  • 62