Micrometer/Prometheus: How to measure processing duration?

Question

I want to measure how long it takes to process some data: My application reads that data from a given source at a fixed rate. Before each circle I store Instant.now(). I read the data, adding that single timestamp to each entry. The data gets stored, converted, and just before I send it out via WebSockets, I want to measure the duration between now() and that initial timestamp.

I tried

long millis = Duration.between(dataEntry.getReceivedTimestamp(), Instant.now()).toMillis();
LOG.info(millis + "ms");
registry.timer("processingDuration").record(millis, TimeUnit.MILLISECONDS);

but visualizing this allows me to use only processingDuration_seconds_count, _max and _sum. count and sum increase over time (of course), max is constant most of the time. So how do I see higher and lower plateaus of load? I tried irate(processingDuration_seconds_sum[10m]) to see jumps at least, but as irate() uses two datapoints only, I still cannot identify longer periods of high load easily. Plus: The value is around 0.6 in the graph, while the logged ms are around 5-10, so I'm losing the actual value here.

So I tried to use a Gauge instead – which is supposed to allow both increasing and decreasing values:

registry.gauge("processingDurationGauge", millis);

I'd thought this would go up and down within the range of the logged ms, but it's constantly 92.

How do I measure throughout time of data?

score 2 · Accepted Answer · answered Jan 21 '20 at 05:26

2

Using a timer and record is the correct solution.

long millis = Duration.between(dataEntry.getReceivedTimestamp(), Instant.now()).toMillis();
LOG.info(millis + "ms");
registry.timer("processingDuration").record(millis, TimeUnit.MILLISECONDS);

Assuming you are scraping every 30 seconds you can us the _sum and _count to get the average recorded duration:

increase(processingDuration_seconds_sum[1m])/increase(processingDuration_seconds_count[1m])

And if you wanted to compare how the current duration is behaving compared to the average over the last day:

((increase(processingDuration_seconds_sum[1m])/
increase(processingDuration_seconds_count[1m]))*1.50) >
increase(processingDuration_seconds_sum[24h])/
increase(processingDuration_seconds_count[24h])

That would only return values where the 1m average is more that 1.5x the daily average. (I haven't tested that query, but it should get the idea).

answered Jan 21 '20 at 05:26

checketts

14,167
10
53
82

What's the advantage of this compared to the Gauge? I see a disadvantage: The resulting graph never matches to logged milliseconds – crusy Jan 21 '20 at 10:19
1

Just saw [this documentation](https://micrometer.io/docs/registry/prometheus#_timers) which looks similar (they use `rate()`). Anyways: As this pointed me in the right direction I'll accept it in a minute. Thanks again – crusy Jan 21 '20 at 13:29
I'm very confused - prometheus exports this as seconds even though I add to the timer milliseconds. So does prometheus convert it to seconds? Or does it dispaly the actual recorded milliseconds but it labels it seconds? – alex Jun 11 '20 at 19:43

score 0 · Answer 2 · answered Jan 20 '20 at 12:26

Problem was, long isn't thread safe, see this answer. Doing it as documented works as expected:

private final AtomicLong processingDuration;

// ...

// in constructor:
processingDuration = meterRegistry.gauge("processingDuration", new AtomicLong(0L));

// ...

// before finishing the data entries' handling:
long millis = Duration.between(dataEntry.getReceivedTimestamp(), Instant.now()).toMillis();
LOG.info(millis + "ms");
processingDuration.set(millis);

Micrometer/Prometheus: How to measure processing duration?

2 Answers2