1

I'm still getting to grips with PromQL. I wrote this query in an attempt to detect the number of kubernetes pods that existed in the last 24 hours within a given namespace.

My process here was:

  • Get the metric filtered to the relevant name-spaces (any airflow ones).
  • Get that metric over 24 hours.
    • Each pod will just have lots of duplicates of the same creation time here.
  • Use increase() to get the range vectors for each pod back into instant vectors. The value will always be 0 as the creation time does not increase.
  • Now that we have 1 value per pod, use count() to see how many existed in that time frame.
count(increase(kube_pod_created{namespace=~".*-airflow"}[1d]))

Can anyone that knows prometheus well tell me if this logic follows? Since it isn't a normal database/etc I'm having trouble working out how to validate this query. It "looks" like it probably does the right thing when expanded out to a day though.

John Humphreys
  • 37,047
  • 37
  • 155
  • 255

2 Answers2

1

I'd recommend substituting increase() with count_over_time(), since increase may miss short-living pods with lifetime smaller than 2x scrape interval. The following query should return the total number of pods seen during the last 24 hours:

count(count_over_time(kube_pod_created{namespace=~".*airflow"}[24h]))
valyala
  • 11,669
  • 1
  • 59
  • 62
  • Thanks, that suggestion makes a ton of sense here since airflow schedules thousands of often very-short-lived tasks on kubernetes :). Question... if a pod lived between 2 scrape intervals, would this metric catch it? or does the lookup mechanism only catch pods-live-at-scrape-time? I wasn't sure if it had any kind of look-back function. – John Humphreys Aug 05 '20 at 16:20
  • If the pod lifetime is smaller than the scrape interval, then it may be left unnoticed by Prometheus. – valyala Aug 09 '20 at 13:14
  • 1
    24h can be dynamic : $__interval or $_range.. according to your grafana version – Abdennour TOUMI Oct 31 '21 at 06:18
0

The following query should return the number of pods that existed in the last 24 hours:

count(last_over_time(kube_pod_created[24h]))

The last_over_time(kube_pod_created[24h]) returns time series for pods that existed in the last 24 hours (see last_over_time() docs). The count() returns the number of such time series, which equals to the number of pods that existed in the last 24 hours.

valyala
  • 11,669
  • 1
  • 59
  • 62