20

I have articles and for each article I want to have read count

# TYPE news_read_counter2 Counter
news_read_counter2{id="2000"} 168

now the counters on the servers are saved in redis\memcached so they can get reset from time to time so after a while the redis machine is restart and the server dont have the last news_read_counter number and if I start from zero again

# TYPE news_read_counter2 Counter
news_read_counter2{id="2000"} 2

now looking at the news_read_counter2{id="2000"} graph I see that the counter is getting dropped to 2 while the docs says:

A counter is a cumulative metric that represents a single numerical value that only ever goes up.

so now to keep track of the news_read_counter I need to save the data into db and I back to the start zone where I need to use mysql to handle my data

here an Image of counter after redis got restart: enter image description here

Amir Bar
  • 3,007
  • 2
  • 29
  • 47

3 Answers3

18

Counters are allowed to be reset to 0, so there's no need to do anything special here to handle it. See http://www.robustperception.io/how-does-a-prometheus-counter-work/ for more detail.

It's recommended to use a client library which will handle all of this for you.

Also, by convention you should suffix counters with _total so that metric should be news_reads_total.

brian-brazil
  • 31,678
  • 6
  • 93
  • 86
  • Thanks pretty sure Prometheus is not for me :) saw some php clients they just write everything to local mysql which is funny cause they now also need the scale mysql to handle the load, and I saw that counters can reset to 0 but my case if that they not always resert to 0 if the counter got reset to zero and before Prometheus pull the data the counter increase just like my example it goes wrong – Amir Bar May 31 '16 at 17:16
  • 2
    I gave it a last try pretty sure counters are broken, I run statsd_exporter send it foo counter increase it to 4, restart statsd_exporter send it counter of 0 then +1 expected to see counter of 5 but I get counter of 1, I also tried without sending it 0 same thing – Amir Bar Jun 02 '16 at 04:36
  • 5
    That's the correct behaviour. The `rate` function in Prometheus will automatically handle such counter resets. – brian-brazil Jun 02 '16 at 14:03
13

You generally don't want to look at the total of a counter the way that you are in your example, because it's not very meaningful once you actually try to use it analytically.

The idea is that you want to know increases over a period of time. For example, do you want to know the total amount of article views for the last 7 days, for this month so far, for the last 30 days, etc.

This answer and this article do an excellent job of explaining all this, but here are some examples. For demonstration purposes I use a counter called walks_started_total.

The problem

Query: `walks_started_total`

enter image description here

Solution 1

Seeing the total for the last week: `increase(walks_started_total[1w])`

enter image description here

Solution 2

Over a 1 minute period: `increase(walks_started_total[1m])`

enter image description here

aggregate1166877
  • 2,196
  • 23
  • 38
  • 2
    4 years latter and I still dont understand it :) to get a simple number of total article views is counter a solution? – Amir Bar Mar 04 '20 at 08:12
  • 4
    @AmirBar I feel you XD. Took me a while too. Yes, counters are a good solution. You simply need to make sure you query over a time period using `increase` like in the example. Have a look at the article I linked, it helped me a lot: https://www.innoq.com/en/blog/prometheus-counters/ – aggregate1166877 Mar 05 '20 at 01:42
2

It is OK if counter is reset to zero on service restart, since Prometheus provides increase and rate functions, which remove counter resets before performing actual calculations. Usually Prometheus counters must be wrapped into these functions in order to get meaningful results. For example:

  • increase(news_read_counter2[24h]) returns the number of news reads for the last 24 hours
  • rate(news_read_counter2[1h]) returns the average per-second news read rate for the last hour

If you need obtaining an absolute counter value after counter resets' removal, then this can be done with increase(news_read_counter2[10y]). This query returns the total number of news reads for the last 10 years. Prometheus calculates the specified query independently per each point displayed on the graph. So the query would display non-decreasing graph with an absolute number of news reads since the first new read for the last 10 years. Note that the increase() query with too big lookbehind window in square brackets may work slowly, since it needs to process all the raw samples stored in Prometheus for time series with news_read_counter2 name.

Note that increase() function in Prometheus has some issues:

  • It may return fractional results over integer counters because of extrapolation. See this issue for details.
  • It misses potential counter increase between the last raw sample before the lookbehind window in square brackets and the first raw sample inside the lookbehind window.
  • It misses the initial counter increase if time series starts from non-zero sample.

These issues should be fixed eventually according to this design doc. In the mean time you can try VictoriaMetrics - Prometheus-like monitoring system I work on. It supports PromQL-like query language - MetricsQL with increase() function, which is free from issues mentioned above.

P.S. If you need drawing non-increasing graph, which starts from zero at the left side and shows cumulative counter increase on any selected time range, then Prometheus cannot help with this case :( But VictoriaMetrics can help. For example, the following MetricsQL query returns cumulative counter increase on any selected time range:

running_sum(increase(news_read_counter2))

The query uses running_sum function.

The query also uses VictoriaMetrics feature, which allows skipping lookbehind window in square brackets for increase() function (and any other rollup functions). In this case it automatically uses the interval between points on the graph (aka step) as lookbehind window, so all the raw samples are taken into account by the query.

valyala
  • 11,669
  • 1
  • 59
  • 62
  • How does VictoriaMetrics handle the case where a counter is reset and instantly increased to its previous, by coincidence? For example, a counter is at 1, my server dies, it restarts, inits the counter to 0 but instantly gets the event that increases it to 1. Would it still be able to detect that the total value is 2 when I use `increase(...)`? – Telokis Oct 12 '22 at 14:15
  • (My previous comment is because I'm trying to have very precise values stored in VictoriaMetrics and what I get with `increase(metric[10y])` is way less than what I've logged. Logging and metrics are handled at the exact same place so they get the exact same information) – Telokis Oct 12 '22 at 14:27
  • The `increase()` function in VictoriaMetrics detects counter reset only if the fisrt sample after counter reset is smaller than the last sample just before counter reset. If the first sample after counter reset equals to or bigger than the last sample before counter reset, then VictoriaMetrics has no data, which can help detecting the counter reset, so the counter reset is left unnoticed. If you need calculating the exact number of events over some interval, then it may be better storing the sum of events between scrapes and then use `sum_over_time(m[d])` for these calculations – valyala Oct 15 '22 at 10:52