4

I'm trying to build a basic SRE dashboard in order to learn Prometheus/Grafana.

I want to calculate the number of hours the service has been running & the number of hours its been down since the 1st January of the current year so that I can reduce the downtime hours from the error budget. Could a PromQL query be used to calculate this?

I would prefer to use a metric such as up which would be available regardless of the exporter/client library used.

user9492428
  • 603
  • 1
  • 9
  • 25

1 Answers1

2

First of all, are you trying to calculate the availability of the Prometheus service or the availability of the services which are monitored by Prometheus?

If it's the first case then you can use the "up" metric, if it's the second one then you can use, for example, the "probe_success" metric from the Blackbox exporter.

See more info about the "up" and "probe_success" difference here.

See more info about the Blackbox exporter here.

You can calculate the availability (in percentage) with a query like the following:

100 * avg_over_time(probe_success{instance="xxxxx"}[1w])

In Grafana, you can use the global variable "$__range" as the time duration ([$__range]) to use in the PromQL the current time range of the dashboard.

See more info about global variables in the Grafana documentation here.

  • Hi.. Thanks a lot for the response.. I'm trying to calculate the uptime of a service monitored by prometheus. I understand the PromQL query you specified but is it possible to do a similar query but have a fixed date as the range vector? As in I want the range vector to be the difference in days between current date and first of January of the current year – user9492428 Jun 13 '21 at 08:52
  • I added more info to the answer. – Marcelo Ávila de Oliveira Jun 13 '21 at 21:59