1

I want to measure how long it takes to process some data: My application reads that data from a given source at a fixed rate. Before each circle I store Instant.now(). I read the data, adding that single timestamp to each entry. The data gets stored, converted, and just before I send it out via WebSockets, I want to measure the duration between now() and that initial timestamp.

I tried

long millis = Duration.between(dataEntry.getReceivedTimestamp(), Instant.now()).toMillis();
LOG.info(millis + "ms");
registry.timer("processingDuration").record(millis, TimeUnit.MILLISECONDS);

but visualizing this allows me to use only processingDuration_seconds_count, _max and _sum. count and sum increase over time (of course), max is constant most of the time. So how do I see higher and lower plateaus of load? I tried irate(processingDuration_seconds_sum[10m]) to see jumps at least, but as irate() uses two datapoints only, I still cannot identify longer periods of high load easily. Plus: The value is around 0.6 in the graph, while the logged ms are around 5-10, so I'm losing the actual value here.

So I tried to use a Gauge instead – which is supposed to allow both increasing and decreasing values:

registry.gauge("processingDurationGauge", millis);

I'd thought this would go up and down within the range of the logged ms, but it's constantly 92.

How do I measure throughout time of data?

crusy
  • 1,424
  • 2
  • 25
  • 54

2 Answers2

2

Using a timer and record is the correct solution.

long millis = Duration.between(dataEntry.getReceivedTimestamp(), Instant.now()).toMillis();
LOG.info(millis + "ms");
registry.timer("processingDuration").record(millis, TimeUnit.MILLISECONDS);

Assuming you are scraping every 30 seconds you can us the _sum and _count to get the average recorded duration:

increase(processingDuration_seconds_sum[1m])/increase(processingDuration_seconds_count[1m])

And if you wanted to compare how the current duration is behaving compared to the average over the last day:

((increase(processingDuration_seconds_sum[1m])/
increase(processingDuration_seconds_count[1m]))*1.50) >
increase(processingDuration_seconds_sum[24h])/
increase(processingDuration_seconds_count[24h])

That would only return values where the 1m average is more that 1.5x the daily average. (I haven't tested that query, but it should get the idea).

checketts
  • 14,167
  • 10
  • 53
  • 82
  • What's the advantage of this compared to the Gauge? I see a disadvantage: The resulting graph never matches to logged milliseconds – crusy Jan 21 '20 at 10:19
  • 1
    Just saw [this documentation](https://micrometer.io/docs/registry/prometheus#_timers) which looks similar (they use `rate()`). Anyways: As this pointed me in the right direction I'll accept it in a minute. Thanks again – crusy Jan 21 '20 at 13:29
  • I'm very confused - prometheus exports this as seconds even though I add to the timer milliseconds. So does prometheus convert it to seconds? Or does it dispaly the actual recorded milliseconds but it labels it seconds? – alex Jun 11 '20 at 19:43
0

Problem was, long isn't thread safe, see this answer. Doing it as documented works as expected:

private final AtomicLong processingDuration;

// ...

// in constructor:
processingDuration = meterRegistry.gauge("processingDuration", new AtomicLong(0L));

// ...

// before finishing the data entries' handling:
long millis = Duration.between(dataEntry.getReceivedTimestamp(), Instant.now()).toMillis();
LOG.info(millis + "ms");
processingDuration.set(millis);
crusy
  • 1,424
  • 2
  • 25
  • 54