2

I have hundreds of hosts reporting to a prometheus server. I have many exporters per host. I want to be able to make a list of hosts that I don't want alerting from. I still need the prometheus monitoring on these hosts.

I've tried matching a route with no receiver. It doesn't work. What am I doing wrong? Or, how should I be doing this?

My route rules. I would expect the first match to match the ignorable instances and parsing to stop. I still get the alerts. :-(

route:
  receiver: 'team-ops-mails'
  group_by: ['alertname', 'cluster']
  group_wait: 30s
  group_interval: 2m
  repeat_interval: 2h 
  routes:
  - match_re:
      instance: "int-pg-01:.*"
    continue: false
  - match:
      nopage: true
    receiver: team-mattermost
    repeat_interval: 24h
  - match:
      severity: hwerror
    receiver: hwerror-receiver
    repeat_interval: 24h
  - match:
      role: worker
    receiver: team-mattermost 
  - match:
      role: ven-entrance
    receiver: team-mattermost 
Wayne Walker
  • 2,316
  • 3
  • 23
  • 25

1 Answers1

2

Alerting rules allow you to define alter condition based on prometheus expression language.

Sample alerting rule:

groups:
- name: example
  rules:
  - alert: HighRequestLatency
    expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
    for: 10m
    labels:
      severity: page
    annotations:
      summary: High request latency

One possible way to solve your problem is, add one extra label like enableAlert in metrics. While defining alerting rules, you can ignore firing alerts for some host by defining expr like below:

- name: example
  rules:
  - alert: DemoAlert
    expr: <metric-name> {... ..., enableAlert = "true"} > ref_value

Set enableAlert = "false" for those instances you don't want to fire alert.

Kamol Hasan
  • 12,218
  • 1
  • 37
  • 46
  • Are you saying to add the enableAlert in the prometheus.yml targets element? That still means lots of editing at least two lines for every single host, in 43 places (unique job_names), and editing all the alerting_rules (over 50). I still want to know why my match_re with continue: false does not stop matching. https://prometheus.io/docs/alerting/configuration/#route seems to state that the parsing will stop – Wayne Walker Aug 23 '19 at 22:28
  • 1
    No, you don't need to change it everywhere. You can add labels to metrics using `relabel_configs`. All you need a regex that will match all the host you don't want to get alert from. Or you can simply send this label (enableAlerlt) from server side. – Kamol Hasan Aug 24 '19 at 09:22
  • @KamolHasan Where i can read what is colons in expression `job:request_latency_seconds:mean5m`? – a0s Sep 06 '21 at 12:02
  • @a0s https://prometheus.io/docs/practices/rules/#naming-and-aggregation – Kamol Hasan Sep 07 '21 at 04:06