3

I understand with Prometheus we can set up alerting rules which can detect and raise an alert if a pod crashes.

I want to understand how does Prometheus itself know when a pod crashed or is stuck in pending state.

  • Does it know this when it is trying to scrape metrics from pod's http endpoint port?

OR

  • Does Prometheus get the pod status information from Kubernetes?

The reason why I'm asking this is because I want to set up Prometheus to monitor existing pods that I have already deployed. I want to be alerted if a pod keeps crashing or if it is stuck in pending state. And I want to know if Prometheus can detect these alerts without making any modifications to the code inside the existing pods.

BlueChips23
  • 1,861
  • 5
  • 34
  • 53

3 Answers3

2

The common way for prometheus to extract metrics and health is by the use of scraping (thru an http endpoint is the most common). Since pods can have multiple containers, it is best to scrape an http endpoint of your running container.

If prometheus didnt receive a good response from this endpoint, it can determine that the container is down.

Prometheus itself does not do alerting, you normally delegate that to the alert manager.

Bal Chua
  • 1,134
  • 9
  • 10
  • 1
    So wait. If a pod doesn’t have a container with HTTP endpoint, Prometheus can’t determine the pod status? Seems quite regressive since Kubernetes already exposes pod status. I understand the need for http endpoint for collecting metrics, but for pod statuses, it’s a bit weird. – BlueChips23 Jul 19 '18 at 12:17
  • 1
    Oh sorry, for pod status, you can use kube-state-metrics where prometheus can also scrape. https://github.com/kubernetes/kube-state-metrics/blob/master/README.md. I think kube-state-metrics is a replacement for heapster. – Bal Chua Jul 19 '18 at 23:50
2

use sum(kube_pod_container_status_waiting_reason) by (reason) to get all the container waiting reasons if any

Kumail Haider
  • 97
  • 1
  • 2
1

kube-state-metrics gathers information from kube-apiserver for the state of kubernetes objects (such as pods, deployments, etc.). It is packed in prometheus-operator. To answer your question, you will not need the pod to be up to be able to scrape its status metrics, you will gather those directly from the apiserver (via scaping kube-state-metrics endpoint).

To check what pod level metrics are available to you via kube-state-metrics check: https://github.com/kubernetes/kube-state-metrics/blob/master/docs/pod-metrics.md

Per the answer above you can use kube_pod_container_status_waiting_reason metric or if you just want to alert on threshold regardless of the reason, you can use kube_pod_container_status_waiting

Christina A
  • 390
  • 1
  • 10