4

For this issue I'm trying to create a grok pattern, which matches the first IP from the X-Forwarded-For header in a nginx log. A log line typically looks like this:

68.75.44.178, 172.68.146.54, 127.0.0.1 - - [15/May/2017:12:16:27 +0200] "GET /jobs/24237/it-back-end HTTP/1.1" 301 5 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

The first IP is the the clients actual IP, which is the one I want to retreive, the other two come from proxies, in our case cloudflare and varnish.

My pattern, which I tried on https://grokconstructor.appspot.com looks like this:

FIRSTIPORHOST (^%{IPORHOST})(?:,\s%{IPORHOST})*

Unfortunally it matches all IPs, despite the non capturing group, so what am I doing wrong? Or is there a better pattern?

Clarification:

One to read the whole log file into elastic search using filebeats, I therefore need to somehow match IPs, otherwise I won't be able to match the rest of the line, like the date or user agent and so on.

baudsp
  • 4,076
  • 1
  • 17
  • 35
sepal
  • 43
  • 2
  • 5
  • Maybe you just do not the non-capturing group? Try `FIRSTIPORHOST ^(%{IPORHOST})` – Wiktor Stribiżew May 18 '17 at 12:15
  • But I actually want to match other stuff on the line as well, like the date or the user agent, basically the stuff the filebeats current pattern matches: https://github.com/elastic/beats/blob/master/filebeat/module/nginx/access/ingest/default.json – sepal May 18 '17 at 12:25
  • sepal, if the suggestion above does not work, why do you need the additional grok pattern? Just use a series of patterns to match the "tokens" you need. Please post a **full** sample log line. Also, try just using `%{IPORHOST:nginx.access.remote_ip}(?:, %{IPORHOST})*` instead of the `%{IPORHOST:nginx.access.remote_ip}` alone. – Wiktor Stribiżew May 18 '17 at 12:26
  • I added a full log line as requested. The problem is the spaces and commas between the IPs, as well as the fact that I just want to extract the first IP and not the ones for the proxies. – sepal May 18 '17 at 12:33
  • I tested at https://grokdebug.herokuapp.com/ with `%{IPORHOST:nginx.access.remote_ip}(?:, [\d.]+)*` and it seems working as expected: `"nginx": [ [ "68.75.44.178" ],`. [Here is the full expression](https://pastebin.com/jK5k4BQy). – Wiktor Stribiżew May 18 '17 at 12:39
  • 1
    Oh wow, thanks, that works. The solution is obvious, why didn't I think of that. – sepal May 18 '17 at 12:49

2 Answers2

2

You need to add the (?:,\s[\d.]+)* after the %{IPORHOST:nginx.access.remote_ip} at the start of the pattern. See the fixed expression:

"%{IPORHOST:nginx.access.remote_ip}(?:,\\s[\\d.]+)* - %{DATA:nginx.access.user_name} \\[%{HTTPDATE:nginx.access.time}\\] \"%{WORD:nginx.access.method} %{DATA:nginx.access.url} HTTP/%{NUMBER:nginx.access.http_version}\" %{NUMBER:nginx.access.response_code} %{NUMBER:nginx.access.body_sent.bytes} \"%{DATA:nginx.access.referrer}\" \"%{DATA:nginx.access.agent}\""

The (?:,\s[\d.]+)* non-capturing repeated group matches 0+ occurrences of:

  • , - a comma
  • \s - a whitespace
  • [\d.]+ - 1+ digits or commas.

This way, no additional data can be captured.

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • Sorry, I just noticed that there is still a problem. Using `(?:,\s[\d.]+)*` will work on test tools like https://grokconstructor.appspot.com, but filebeats requires the the character patterns to be escaped like you did on the full pattern line. But this somehow this breaks the pattern, which means only the last IP is matched. I was able to replicate the problem on the test tools and on a filebeat instance. – sepal May 18 '17 at 13:29
  • Did you replace the pattern with mine above? Doesn't it work? – Wiktor Stribiżew May 18 '17 at 13:31
  • Yes, I replaced the whole pattern with the one you posted, but filebeat still logs the wrong IP. – sepal May 18 '17 at 13:45
  • One thing is certain: this pattern is correct. The only trouble is its implementation. – Wiktor Stribiżew May 18 '17 at 13:55
  • There is one minor error: At the end the closing `\"` is missing. But yes I agree, the pattern is technically correct. – sepal May 18 '17 at 14:04
  • 1
    No problem. In the meantime I resolved the issue. I had to delete the existing pipeline in elasticsearch. The pattern works correctly and my elasticsearch instance now stores the correct IP address. Thanks for the help! – sepal May 18 '17 at 15:05
0

Given filter did not worked for me during my x_forwarder_for greeping but solution mentioned on another page worked https://serverfault.com/questions/725186/grok-issue-with-multiple-ips-in-nginx-logstash

  • This is a borderline [link-only answer](//meta.stackexchange.com/questions/8231). You should expand your answer to include as much information here, and use the link only for reference. – Filnor Jan 04 '19 at 12:10