AlertManager downtime alert unless 429 (To Many Requests) HTTP status code

Question

Currently I have an AlertManager config that simply sends an alert when the "probe_success" metric is 0.

I don't know how I could join the "probe_http_status_code" metric with the "probe_success" metric in the "expr" field of an alert rule to keep the alert from firing when the "probe_success" metric is 0 because of a 429 (To Many Requests) HTTP status code.

I tried to figure this out using the similar question below, but no luck.
How can I 'join' two metrics in a Prometheus query?

"probe_success" and "probe_http_status_code" are both Blackbox Exporter metrics.

score 2 · Accepted Answer · answered Jul 04 '19 at 13:34

2

What you probably want here is valid_status_codes, so you can specify 429 (plus whatever 2xx codes are expected) as valid which will keep probe_success as 1 when they happen.

answered Jul 04 '19 at 13:34

brian-brazil

31,678
6
93
86

I thought about this and it's a proper solution, but I don't think that it's the responsibility of Prometheus/the exporter to decide whether (in this case) a status code is vaid or not. I'd think that it's the AlertManager that is responsible for that decision and whether to alert on it or not. What are your thoughts? – Julian Jul 04 '19 at 13:42
That's not the Alertmanager's role at all, alerts are already firing when they get to it. The blackbox exporter has many options to determine what is considered failure, and this is one case for it. – brian-brazil Jul 04 '19 at 15:19
AlertManager was a mistake, I meant the alert rules in Prometheus. Deciding what a failure is for a probe and what not is indeed the responsibility of the Blackbox Exporter, not the alert conditions in Prometheus. Thanks! – Julian Jul 04 '19 at 15:33

AlertManager downtime alert unless 429 (To Many Requests) HTTP status code

1 Answers1