Grafana alerting evation for 24 hrs

Question

Generally, I am creating a cpu alert rule in Grafana for Prometheus data source. Here I want to get the alert to evaluate for every 24hrs. Where I have to mention this. For every 24 hrs. I want the condition to be checked. And also, is there any solution to disable the alert for if there are no alerts for that particular rule for last 24hrs. Can anyone clarify. Thanks, in advance.

What kind of alert are you expecting and what have you tried? Also, could you clarify what you mean by "evaluate for every 24hrs": once a day, or check values for the last 24 hours? — markalex, Mar 29 '23 at 08:59
Prometheus datasource alerts for node and cpu usage and I want to get alerts for every 24hrs how to get that — sai krishna, Mar 30 '23 at 14:54
Config: Group wait: 1s Group interval: 1m Repeat interval: 24h I am setting this configuration to get the alert triggered for every 24hrs but it is not reflecting The alert is getting usual time And also is there any way how to disable the alerts for nodata — sai krishna, Mar 30 '23 at 15:06

score 0 · Answer 1 · answered Mar 30 '23 at 19:50

0

It seems like you misunderstood concept of repeat_interval. Documentation says:

How long to wait before sending a notification again if it has already been sent successfully for an alert. (Usually ~3h or more).

So basically when you write repeat_interval: 24h it means, that if your alert was not resolved for 24 hours notification will be sent again.
Generally it is not a good idea, unless you have very specific needs. I advise you to set it back to default 4h.

If I understand your idea correctly you want to check rule for your alert once a day (for whole previous day I suppose). AFAIK, there is no built-in functionality for this in alertmanager.

But you could use a little trick in Prometheus alerting rules to achieve this. You can add check for hour into your rule, like this:

#you_initial_rule# and on() hour() == 14

For example

  - alert: HostHighCpuLoad
    expr: sum by (instance) (avg by (mode, instance) (rate(node_cpu_seconds_total{mode!="idle"}[24h]))) > 0.8  and on() hour() == 14
    for: 0m
    labels:
      severity: warning
    annotations:
      summary: Host high CPU load (instance {{ $labels.instance }})
      description: "CPU load is > 80%\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

That way alert rule will be checked only between 14:00 and 14:59.

As per rules for CPU: you can use provided in example if it's suitable for your needs or search through this list of useful alert rules.

answered Mar 30 '23 at 19:50

markalex

8,623
2
7
32

sorry for 2 days i want then i need to get the alert for every 2 days – sai krishna Mar 31 '23 at 09:03
There is no reliable way to do it. You could add `and on() day_of_month() % 2 == 1`, but it is not ideal – markalex Mar 31 '23 at 09:12
okay for 1 day we will get alerts right? – sai krishna Mar 31 '23 at 09:59
I am confused can you please clarify me @markalex – sai krishna Mar 31 '23 at 12:35
Yes, for one day you will. – markalex Mar 31 '23 at 12:39
Can you tell what is the query for namespace memory usage in prometheus – sai krishna Apr 01 '23 at 10:27
No, but you can find it [here](https://stackoverflow.com/a/63806947/21363224) – markalex Apr 01 '23 at 10:31
There they are calculating the for the namespace right – sai krishna Apr 01 '23 at 13:03

Grafana alerting evation for 24 hrs

1 Answers1