How to set a retention time for Pushgateway for metrics to expire?

Question

I'm using Pushgateway with Prometheus and everything is OK but after a couple of weeks Pushgateway collapses ... giving it a look there are tons of metrics that are not used anymore and delete them manually is practically impossible ... so ->

There is a way to expire Pushgateway metrics with a TTL or some other retention settings like by size or by time ? ... or maybe both ?

NOTE: I read at the mailing list of Prometheus a lot of people requiring something like this from one year ago or more ... and the only answer so far is -> this is not the Promethean way to do it ... really ? ... common, if this is a real pain for a lot of people maybe there should be a better way (even if it's not the Promethean way)

Metrics for batch job are difficult. The team made the decision not because of the *Promethean* way but because it is hard to justify a feature which would mainly lead to anti-patterns. From a practical point of view I would be happy with a little anti-pattern :) — Michael Doubez, Aug 24 '20 at 14:39
If you need pushing Prometheus metrics to a centralized storage, then take a look at VictoriaMetrics. It supports metrics ingestion via various protocols, including [Prometheus text exposition format](https://victoriametrics.github.io/#how-to-import-data-in-prometheus-exposition-format). — valyala, Apr 08 '21 at 12:59

score 5 · Accepted Answer · answered Aug 24 '20 at 14:53

Supposing you want to remove the metrics related to a group when they become too old (for a given definition of too old), you have the metric push_time_seconds which is automatically defined by the pushgateway.

push_time_seconds{instance="foo",job="bar",try="longtime"} 1.598280005888635e+09

With this information, you can write a script that request/grab this metric and identify the old group of data ({instance="foo",job="bar",try="longtime"}) with the value. The API let you remove of metrics related to your old data:

 curl -X DELETE http://pushgateway:9091/metrics/job/bar/instance/foo/try/longtime

This can be done in a few lines of bash script or python.

GREAT !! ... thanks mate, this will do the trick !! – Carlos Saltos Sep 30 '20 at 07:16 — Carlos Saltos, Sep 30 '20 at 07:16

Dinu Mathai · Answer 2 · 2022-04-10T05:11:28.883

5

Did not get a positive response from Prometheus team. So implemented the same.

https://github.com/dinumathai/pushgateway

docker run -d -p 9091:9091 dmathai/prom-pushgateway-ttl:latest --metric.timetolive=60s

edited Apr 10 '22 at 05:11

answered Apr 07 '21 at 13:43

Dinu Mathai

471
6
7

it is really great ! thank you. I Hope you will support it as long as possible :D – Stepan K. Apr 12 '21 at 09:54
Welcome, Sure @StepanK. – Dinu Mathai Apr 12 '21 at 10:02

Kaustubh Choudhury · Answer 3 · 2023-06-30T14:14:15.453

You can run this as a sidecar container in pushgateway pod.

- name: pushgateway-metrics-purger
  image: <image/with/curl>
  command:
  - sh
  - -c
  - |
    while true
    do
      del_req="curl -X DELETE http://localhost:9091/metrics/job/"
      curl -s http://localhost:9091/metrics | \
      grep push_time_seconds | \
      grep -Ev '^#' | \
      while read line
      do 
        last_pushed=$(printf "%.f" `echo $line | awk '{print $2}'`)
        job_name=$(echo $line | \
                awk -F '}' '{print $1}' | \
                grep -o 'job=.*' | \
                cut -f1 -d ',' | \
                cut -f2 -d'=' | \
                tr -d '"')
        std_unix_time_now=$(date +%s)
        interval_seconds=$((std_unix_time_now - last_pushed))
        [ $interval_seconds -gt 15 ] \
        && eval $del_req$job_name && echo "$(date), Deleted job group - $job_name" \
        || echo "$(date), Purge action skipped. Interval not satisfied" # adjust interval_seconds as per requirement
      done
      sleep 3600
    done

score 1 · Answer 4 · answered Aug 24 '22 at 01:57

Here is an implementation, which worked for many use cases here.

Add a TTL (time-to-live) label to each metric.
Next, periodically run an independent purge script that scans /metrics endpoint and deletes expired metrics based on push_time_seconds.

Adding TTL on publisher side decentralizes lifetime of each metric and makes the solution dynamic, instead of expiring after a fixed interval. Also, my organization didn't want to deviate from the original software (no option for custom docker images).

How to set a retention time for Pushgateway for metrics to expire?

4 Answers4