prometheus query for continuous uptime

Question

I'm a prometheus newbie and have been trying to figure out the right query to get the last continuous uptime for my service.

For example, if the present time is 0:01:20 my service was up at 0:00:00, went down at 0:01:01 and went up again at 0:01:10, I'd like to see the uptime of "10 seconds".

I'm mainly looking at the "up{}" metric and possibly combine it with the functions (changes(), rate(), etc.) but no luck so far. I don't see any other prometheus metric similar to "up" either.

score 7 · Accepted Answer · answered Mar 03 '19 at 22:11

7

The problem is that you need something which tells when your service was actually up vs. whether the node was up :)
We use the following (I hope one will help or the general idea of each):
1. When we look at a host we use node_time{...} - node_boot_time{...}
2. When we look at a specific process / container (docker via cadvisor in our case) we use node_time{...} - on(instance) group_right container_start_time_seconds{name=~"..."}) by(name,instance)

answered Mar 03 '19 at 22:11

Elad Amit

575
3
7

Thanks for defining the problem, that helped. I was able to come up with a query that goes something like "time() - process_start_time{service="service-x"}" which seems to return my desired value. – Nodelay Heehoo Mar 04 '19 at 22:34
ur welcome, feel free to upvote / mark the answer as correct for it to be more available for other with similar issues :) – Elad Amit Mar 05 '19 at 07:43

score 0 · Answer 2 · answered Apr 14 '22 at 11:08

The following PromQL query must be used for calculating the application uptime in seconds:

time() - process_start_time_seconds

This query works for all the applications written in Go, which use either github.com/prometheus/client_golang or github.com/VictoriaMetrics/metrics client libraries, which expose the process_start_time_seconds metric by default. This metric contains unix timestamp for the application start time.

Kubernetes exposes the container_start_time_seconds metric for each started container by default. So the following query can be used for tracking uptimes for containers in Kubernetes:

time() - container_start_time_seconds{container!~"POD|"}

The container!~"POD|" filter is needed in order to filter aux time series:

Time series with container="POD" label reflect e.g. pause containers - see this answer for details.
Time series without container label correspond to e.g. cgroups hierarchy. See this answer for details.

If you need to calculate the overall per-target uptime over the given time range, then it is possible to estimate it with up metric. Prometheus automatically generates up metric per each scrape target. It sets it to 1 per each successful scrape and sets it to 0 otherwise. See these docs for details. So the following query can be used for estimating the total uptime in seconds per each scrape target during the last 24 hours:

avg_over_time(up[24h]) * (24*3600)

See avg_over_time docs for details.

prometheus query for continuous uptime

2 Answers2