Hey I'm currently trying to determine uptime of a pod with kube state metrics, specifically when a pod has started or stopped. I am using a Prometheus Deployment with Kube State metrics in order to determine when a pod has been started and stopped. Specifically I want to get the following metrics:
kube_pod_completion_time
kube_pod_created
As a test I've configured Prometheus to gather metrics with the following config.yml file:
global:
scrape_interval: 10m
scrape_timeout: 10s
evaluation_interval: 10m
scrape_configs:
- job_name: kubernetes-nodes-cadvisor
honor_timestamps: true
scrape_interval: 10m
scrape_timeout: 10s
metrics_path: /metrics
scheme: https
authorization:
type: Bearer
credentials_file: /var/run/secrets/kubernetes.io/serviceaccount/token
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
follow_redirects: true
enable_http2: true
relabel_configs:
- separator: ;
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
action: labelmap
- separator: ;
regex: (.*)
target_label: __address__
replacement: kubernetes.default.svc:443
action: replace
- source_labels: [__meta_kubernetes_node_name]
separator: ;
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
action: replace
metric_relabel_configs:
- source_labels: [__name__]
regex: '(container_cpu_usage_seconds_total|container_fs_reads_bytes_total|container_fs_writes_bytes_total|container_memory_max_usage_bytes|container_network_receive_bytes_total|container_network_transmit_bytes_total)'
action: keep
kubernetes_sd_configs:
- role: node
kubeconfig_file: ''
follow_redirects: true
enable_http2: true
- job_name: 'kube-state-metrics'
scrape_interval: 10m
static_configs:
- targets: ['kube-state-metrics.kube-system.svc.cluster.local:8080']
metric_relabel_configs:
- source_labels: [__name__]
regex: '(kube_pod_labels|kube_pod_created|kube_pod_completion_time|kube_pod_container_resource_limits)'
action: keep
remote_write:
- url: http://example.com
remote_timeout: 30s
follow_redirects: true
enable_http2: true
oauth2:
token_url: https://example.com
client_id: myCoolID
client_secret: myCoolPassword
queue_config:
capacity: 2500
max_shards: 200
min_shards: 1
max_samples_per_send: 10
batch_send_deadline: 5s
min_backoff: 30ms
max_backoff: 5s
metadata_config:
send: false
Additionally I also have the following test pod deployment running:
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: busy-box-test
spec:
replicas: 1
selector:
matchLabels:
app: busy-box-test
template:
metadata:
labels:
app: busy-box-test
spec:
containers:
- command:
- sleep
- '300'
image: busybox
name: test-box
However when I go to search for metrics regarding kube_pod_completion_time
I cannot find any in my remote write source, while I do have all the other metrics specified in the regex. (kube_pod_labels|kube_pod_created ... kube_pod_container_resource_limits
)
Additionally I've tried the following command to see if they are present in the cluster:
kubectl get --raw '/metrics' | grep kube_
and kubectl get --raw 'kube-state-metrics.kube-system.svc.cluster.local:8080'
but I don't find anything definitive. I suspect the command is looking in the wrong location
So beyond if I am missing something obvious I missed I have the following open questions:
Is there an endpoint I should hit inside the cluster which should return the completion time? Is there an issue with the polling interval being once every 10 minutes for a pod that comes up and down every 5? (If anyone knows how long a terminated history will stick around in kube state metrics that would be great to know as well)
I've included the configuration for kube state metrics here: https://gist.github.com/twosdai/12607c8459bdb73fc98edbbcb17b5eb5 in order to keep the post a bit more concise. The cluster is running in AWS EKS Version: 1.22