Regular expression to extract part of a file path using the logstash grok filter

Question

I am new to regular expressions but I think people here may give me valuable inputs. I am using the logstash grok filter in which I can supply only regular expressions.

I have a string like this

/app/webpf04/sns882A/snsdomain/logs/access.log

I want to use a regular expression to get the sns882A part from the string, which is the substring after the third "/", how can I do that?

I am restricted to regex as grok only accepts regex. Is it possible to use regex for this?

score 6 · Answer 1 · answered Mar 22 '14 at 02:42

6

Yes you can use regular expression to get what you want via grok:

/[^/]+/[^/]+/(?<field1>[^/]+)/

answered Mar 22 '14 at 02:42

CWoods

592
6
13

I know this answer is way too late, but +1 anyway for being the first *correct* answer That is, a standalone regex (no other code and no delimiters) that uses named capture for the parts it's supposed to extract. – Alan Moore Mar 22 '14 at 05:12

score 2 · Answer 2 · answered Nov 23 '12 at 05:27

2

for your regex:

    /\w*\/\w*\/(\w*)\/

You can also test with: http://www.regextester.com/

By googling regex tester, you can have different UI.

answered Nov 23 '12 at 05:27

junky

336
2
7

From http://www.regextester.com/ it gives me no match, I tried http://gskinner.com/RegExr/ no result there as well... – flyasfish Nov 23 '12 at 05:34
This solution relies on directory and file names always consisting of alphanumeric characters or underscores. In particular there may be no spaces anywhere in the path – Borodin Nov 23 '12 at 05:39
the match is index 0 based. You can also see: 1: (sns882A), which means its the first match. – junky Nov 23 '12 at 05:53
When using /\w*\/\w*\/(\w*)\/ for grok filter, got grok parse failure error maybe because no match found. – flyasfish Nov 23 '12 at 06:19

mvp · Answer 3 · 2012-11-23T06:36:20.013

0

This is how I would do it in Perl:

my ($name) = ($fullname =~ m{^(?:/.*?){2}/(.*?)/});

EDIT: If your framework does not support Perl-ish non-grouping groups (?:xyz), this regex should work instead:

^/.*?/.*?/(.*?)/

If you are concerned about performance of .*?, this works as well:

^/[^/]+/[^/]+/([^/]+)/

One more note: All of regexes above will match string /app/webpf04/sns882A/.

But matching string is completely different from first matching group, which is sns882A in all three cases.

edited Nov 23 '12 at 06:36

answered Nov 23 '12 at 05:29

mvp

111,019
13
122
148

When I try ^(?:/.*?){2}/(.*?)/ part on http://gskinner.com/RegExr/, it matched to /app/webpf04/sns882A/ – flyasfish Nov 23 '12 at 05:39
You should use `(?:/[^/]*)`. Otherwise your regex may take a *long* time to decide that it doesn't match – Borodin Nov 23 '12 at 05:44
This is exactly why I used `.*?` - to avoid greedy match, which can be very slow – mvp Nov 23 '12 at 05:46
Confirmed when I give ^(?:/.*?){2}/(.*?)/ to grok filter, I got the /app/webpf04/sns882A/ part of the string – flyasfish Nov 23 '12 at 06:15
Note that matching string is not the same as first matching group. See my amended answer – mvp Nov 23 '12 at 06:36
OP didn't ask for Perl – Chris F Apr 17 '17 at 16:25

score 0 · Answer 4 · answered Nov 23 '12 at 05:35

0

If you are indeed using Perl then you should use the File::Spec module like this

use strict;
use warnings;

use File::Spec;

my $path = '/app/webpf04/sns882A/snsdomain/logs/access.log';
my @path = File::Spec->splitdir($path);

print $path[3], "\n";

output

sns882A

answered Nov 23 '12 at 05:35

Borodin

126,100
9
70
144

I can not use any languages, this is part of the logstash-grok configuration in which I can only supply expressions. – flyasfish Nov 23 '12 at 05:44

Hari krishna Andhra Pradesh · Answer 5 · 2016-01-21T08:28:33.040

0

Same answer but a small bug fix. If you doesnt specify ^ in starting,it will go for the next match(try longer paths adding more / for input.). To fix it just add ^ in the starting like this. ^ means starting of the input line. finally group1 is your answer.

^/[^/]+/[^/]+/([^/]+)/

If you are using any URI paths use below.(it will handle path aswell as URI).

^.*?/[^/]+/[^/]+/([^/]+)/

edited Jan 21 '16 at 08:28

answered Jan 21 '16 at 06:28

Hari krishna Andhra Pradesh

457
4
7

Regular expression to extract part of a file path using the logstash grok filter

5 Answers5

Linked