4

I have read the docs for promtail and doing pipelines and I cannot make heads nor tails of it. All I want to do is drop log lines that originated from Uptimerobot software we use to determine if our site is up or not.

Promtail docs: https://grafana.com/docs/loki/latest/clients/promtail/pipelines/

The logs are in json format:

{ "time":"[17/Feb/2022:02:20:00 +0000]", "remoteIP":"172.18.0.4", "host":"api.mysite.tld", "request":"/v1/byUserId", "query":"", "method":"POST", "status":"200", "userAgent":"Mozilla/5.0+(compatible; UptimeRobot/2.0; http://www.uptimerobot.com/)", "referer":"https://api.mysite.tld/v1/byUserId" }

promtail config

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
- job_name: apache_access
  static_configs:
  - targets:
      - localhost
    labels:
      job: apache_access
      __path__: /var/log/apache/site.access.log
  pipeline_stages:
    - json:
       expressions:
         userAgent: userAgent
    - drop:
       source: "userAgent"
       regex: ".*uptimerobot.*"

- job_name: apache_error
  static_configs:
  - targets:
      - localhost
    labels:
      job: apache_error
      __path__: /var/log/apache/site.error.log

The above config drops all lines from the access job.

How can I modify that to drop just the lines with UptimeRobot in the userAgent?

EDIT

Here are the promtail logs:

level=info ts=2022-02-23T14:26:43.416816223Z caller=server.go:260 http=[::]:9080 grpc=[::]:46085 msg="server listening on addresses"
level=info ts=2022-02-23T14:26:43.417247269Z caller=main.go:119 msg="Starting Promtail" version="(version=2.4.1, branch=HEAD, revision=f61a4d261)"
level=info ts=2022-02-23T14:26:48.417542454Z caller=filetargetmanager.go:255 msg="Adding target" key="/var/log/apache/site.access.log:{job=\"apache_access\"}"
level=info ts=2022-02-23T14:26:48.417721078Z caller=filetargetmanager.go:255 msg="Adding target" key="/var/log/apache/site.error.log:{job=\"apache_error\"}"
ts=2022-02-23T14:26:48.417916312Z caller=log.go:168 level=info msg="Seeked /var/log/apache/site.access.log - &{Offset:0 Whence:0}"
level=info ts=2022-02-23T14:26:48.417970347Z caller=tailer.go:126 component=tailer msg="tail routine: started" path=/var/log/apache/site.access.log
ts=2022-02-23T14:26:48.419048351Z caller=log.go:168 level=info msg="Seeked /var/log/apache/site.error.log - &{Offset:0 Whence:0}"
level=info ts=2022-02-23T14:26:48.419071953Z caller=tailer.go:126 component=tailer msg="tail routine: started" path=/var/log/apache/site.error.log

Editted to include the config changes recommended.

Bodger
  • 1,342
  • 4
  • 16
  • 23
  • I further turned debug logging on in promtail. It shows it dropping the uptimerobot lines, but has no mention of the other lines at all. So I do not know what is going on here. – Bodger Feb 23 '22 at 14:58
  • It is like it is **only** considering the uptimerobot lines. – Bodger Feb 23 '22 at 15:00
  • At this point, I give up, I am going to use Apache piped logs, and pipe it through a python script that can do the needful. – Bodger Feb 24 '22 at 15:32

1 Answers1

0

Try to change pipeline_stages to this:

  pipeline_stages:
    - json:
       expressions:
         userAgent: userAgent
    - drop:
       source: "userAgent"
       regex: ".*uptimerobot.*"
Jiggy
  • 125
  • 1
  • 6
  • That is a valid config, but it filters everything out. There are about 27,000 entries in my test log, and of those only 2500 that are not uptimerobot, but when I go to grafana, it does not see apache_access. – Bodger Feb 23 '22 at 14:30