2

I am facing one issue, I am using below rule in alert_rules.yml when I receive alert, it does not returns hostname where the container is running. How can I achieve to return hostname instead of node ID ?

I tried with container_label_com_docker_swarm_node_name instead of container_label_com_docker_swarm_node_id but does not works.

Any suggestions ?

- alert: task_high_memory_usage_1g
expr: sum(container_memory_rss{container_label_com_docker_swarm_task_name=~".+"})
  BY (container_label_com_docker_swarm_task_name, container_label_com_docker_swarm_node_id) > 1e+09
for: 1m
labels:
  severity: warning
annotations:
  description: '{{ $labels.container_label_com_docker_swarm_task_name }} on ''{{
    $labels.container_label_com_docker_swarm_node_id }}'' memory usage is {{ humanize
    $value}}.'
  summary: Memory alert for Swarm task '{{ $labels.container_label_com_docker_swarm_task_name
    }}' on '{{ $labels.container_label_com_docker_swarm_node_id }}'
Aziz Zoaib
  • 661
  • 8
  • 21

2 Answers2

0

No experience with Docker, but if your container_memory_rss metrics have both a container_label_com_docker_swarm_node_id and a container_label_com_docker_swarm_node_name label, then replacing all occurrences of one with the other in your alert rule (including the rule and description/summary) should work just fine. If the ..._name label is not there, then that may explain why it's not working.

Alin Sînpălean
  • 8,774
  • 1
  • 25
  • 29
  • container_memory_rss does not contains container_label_com_docker_swarm_node_name. – Aziz Zoaib Jul 29 '18 at 11:31
  • I'm afraid you can't aggregate metrics by something that isn't there. You might be able to join with some other metric that has both the `container_label_com_docker_swarm_node_id` and `container_label_com_docker_swarm_node_name` labels, something along these lines: https://stackoverflow.com/a/50357418/8657904 – Alin Sînpălean Jul 29 '18 at 19:17
  • But the problem is that `container_label_com_docker_swarm_node_id` is available in cadvisor metrics and `container_label_com_docker_swarm_node_name` is available in node advisor metrics .. how can I join them to use in my case? – Aziz Zoaib Jul 30 '18 at 05:48
  • Well, identify a set of labels (or at least label values) that uniquely match your cadvisor metric (`container_memory_rss`) and some node advisor metric that has the `container_label_com_docker_swarm_node_name`label, do any `label_replace` operations needed to match both label names and values (for the labels you identified above), then apply the solution above to join them. – Alin Sînpălean Jul 30 '18 at 07:31
  • Do you have any example to to use label_replace operations ? – Aziz Zoaib Jul 30 '18 at 10:37
0

You can try $labels.instance. It will return the instance name where alert is being fired.

Abhishek
  • 131
  • 1
  • 2
  • 8
  • **$label.instance** returns the IP address which is randomly assigned by Docker to its containers. What I need is the hostname of node, where that container is running. – Aziz Zoaib Jul 31 '18 at 06:01