I'm using Promtail + Loki to collect my logs and I can't figure how can I alert for every error in my log files. I'm also using Prometheus, Alertmanager and Grafana. I've seen some people have managed to achieve that, but none of them explained the details. Just to be clear, I'm not looking for alerts that stay in FIRING state or Grafana dashboards with "Alerting" status. All I need is to know every single time an error raises up on one of my logs. In case it cannot be done exactly this way, the next best solution is to scrape for every X seconds and then alert something like: "6 new error messages".
-
Hello, have you found a solution? – Carlos Porta May 22 '22 at 16:09
4 Answers
With Loki v2.0 there is a new way for alerting: https://grafana.com/docs/loki/latest/alerting/
You can now configure it directly in Loki and send it to the Alertmanager.
Update:
As requested a simple example for an alert:
groups:
- name: NumberOfErrors
rules:
- alert: logs_error_count_kube_system
expr: rate({namespace="kube-system"} |~ "[Ee]rror"[5m]) > 5
for: 5m
labels:
severity: P4
Source: Loki

- 1,487
- 1
- 14
- 11
-
7This doesn't really answer the question - the Loki alerting docs don't explain how to make an alert for *every error log*, just metric queries. Have you been able to write such an alerting rule? – Isaac van Bakel Dec 18 '20 at 09:59
For alerting in Loki, add the rule files to the folder specified in the ruler section in your config file.
ruler:
storage:
type: local
local:
directory: /etc/loki/rules
rule_path: /tmp/loki/rules-temp
alertmanager_url: http://alertmanager:9093
ring:
kvstore:
store: inmemory
enable_api: true
enable_alertmanager_v2: true
If your configuration is like above, add your rules files to /etc/loki/rules/
like /etc/loki/rules/app/rules1.yaml
(/tmp/loki/rules/<tenant id>/rules1.yaml
)
For alerting something like: "6 new error messages", You can use the sum(count_over_time()) or count_over_time().
If you have labels like job="error"
and job="info"
, and a common label to both of the jobs as app="myapp"
, then count_over_time({app="myapp"})
will list the values for individual jobs. sum(count_over_time({app="myapp"}))
will list the sum of all the values in both the jobs
Sample configuration for rules1.yaml:
groups:
- name: logs
rules:
- alert: ErrorInLogs
expr: sum(count_over_time({app="myapp"}|~ "[Ee]rror"[1m]) >= 1
for: 10s
labels:
severity: critical
category: logs
annotations:
title: "{{$value}} Errors occurred in application logs"
Here {{$value}}
will give the count returned from the expr.

- 470
- 6
- 15
I had the same question.
Investigating a little bit, I discovered the AlertManager just receives alerts and route them. If you have a service which can translate the Loki searchs into calls to the AlertManager API, it is done. And probably you have already two of them.
I found this thread: https://github.com/grafana/loki/issues/1753
Which contained this video: https://www.youtube.com/watch?v=GdgX46KwKqo
Option 1: Using grafana
They show how to create an alert from a search in Grafana. If you just add an Alert Notification Channel with type "Prometheus Alertmanager" you'll get it.
So, Grafana will fire the alert and Prometheus-AlertManager will manage it.
Option 2: Using promtail
There is other way: to add a promtail pipeline_stage
in order to create a Prometheus Metric with your search and manage it as any other metric: just add the Prometheus alert and manage it from the AlertManager.
You can just read the example in previous link:
pipeline_stages:
- match:
selector: '{app="promtail"} |= "panic"'
- metrics:
panic_total:
type: Counter
description: "total number of panic"
config:
match_all: true
action: inc
And you will have the prometheus metric to be managed as usual alert.

- 1,645
- 2
- 17
- 26
-
Not ideal solution because you can't get full text of "panic message" when grafana send an alert. – Serko Oct 01 '20 at 12:20
-
The question was how to get the content of the alert triggering log-entry inside of the alert message – Greg Z. Feb 24 '21 at 13:58