Regex to get the words after matching string

Question

Below is the content:

Subject:
    Security ID:        S-1-5-21-3368353891-1012177287-890106238-22451
    Account Name:       ChamaraKer
    Account Domain:     JIC
    Logon ID:       0x1fffb

Object:
    Object Server:  Security
    Object Type:    File
    Object Name:    D:\ApacheTomcat\apache-tomcat-6.0.36\logs\localhost.2013-07-01.log
    Handle ID:  0x11dc

I need to capture the words after the Object Name: word in that line. Which is D:\ApacheTomcat\apache-tomcat-6.0.36\logs\localhost.2013-07-01.log.

How can I do this?

^.*\bObject Name\b.*$ matches - Object Name

hwnd · Answer 1 · 2014-09-10T17:40:20.423

72

But I need the match result to be ... not in a match group...

For what you are trying to do, this should work. \K resets the starting point of the match.

\bObject Name:\s+\K\S+

You can do the same for getting your Security ID matches.

\bSecurity ID:\s+\K\S+

edited Sep 10 '14 at 17:40

answered Oct 05 '13 at 02:51

hwnd

69,796
4
95
132

6

`\K` not working in javascript, any other solutions? – Jim Nov 01 '16 at 03:59
This worked great for me in Notepad++. I'm not sure what regex processor it uses, but it does allow the \K when doing regex searches. – Mark Jun 07 '17 at 20:33
1

regexr says \K works only with PCRE and not in javascript, no clue what PCRE is though, seems server sided stuff. – Mixxiphoid Sep 11 '18 at 14:15

Smern · Accepted Answer · 2022-11-15T23:29:10.147

60

If you are using a regex engine that doesn't support \K, the following should work for you:

[\n\r].*Object Name:\s*([^\n\r]*)

Working example

Your desired match will be in capture group 1.

[\n\r][ \t]*Object Name:[ \t]*([^\n\r]*)

Would be similar but not allow for things such as " blah Object Name: blah" and also make sure that not to capture the next line if there is no actual content after "Object Name:"

edited Nov 15 '22 at 23:29

answered Oct 05 '13 at 02:18

Smern

18,746
21
72
90

3

But i need the match result to be `D:\ApacheTomcat\apache-tomcat-6.0.36\logs\localhost.2013-07-01.log` not in a match group – Chamara Keragala Oct 05 '13 at 02:26
@CasperNine, why? And what language are you using? – Smern Oct 05 '13 at 02:26
because the program im using captures only match result. Im using a log management tool called logstash. put your regex to this site http://regexpal.com/ and see.. it matches the whole line. – Chamara Keragala Oct 05 '13 at 02:30
3

@CasperNine, it depends on if that supports lookbehinds. Try this and let me know your result: `(?<=Object Name:)([^\n\r]*)` See [here](http://rubular.com/r/jKUqWN2Tb1) – Smern Oct 05 '13 at 02:37
@CasperNine, then you'll have to either use capture groups or base it off the following line like this: `[^\s]+(?=\s+Handle ID:)` The problem with this is that it isn't flexible so if your format or order changes at all it wont work. – Smern Oct 05 '13 at 02:45
n̶o̶p̶e̶ ̶i̶t̶ ̶d̶o̶e̶s̶n̶'̶t̶ ̶w̶o̶r̶k̶ ̶:̶(̶ . Sorry it Works. but keeps a blank space at the beginning of the match line/ – Chamara Keragala Oct 05 '13 at 02:46
let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/38647/discussion-between-caspernine-and-smerny) – Chamara Keragala Oct 05 '13 at 02:47
In lookbehinds you cannot use quantifiers so you could remove the blank space by putting the exact amount of spaces... but this wouldn't be flexible if you have varying number of spaces between the key/value pair. – Smern Oct 05 '13 at 02:53
I have one more question from you. How do i use `[^\s]+(?=\s+Handle ID:)` when the string is something like `Object Name: F:\Shared\Full_Option\Standed sinhala letters\Lalith\~$rapt order.doc`? Something with spaces – Chamara Keragala Oct 08 '13 at 08:42
2

@CasperNine, you could try matching against newlines instead of any space characters... `[^\r\n]+(?=\s+Handle ID:)` – Smern Oct 08 '13 at 12:55
For any future visitors, I strongly suggest looking at @hwnd answer which captures the need better and is a more general purpose solution – Hesam Korki Oct 25 '22 at 13:52
depends on the regex engine being used. – Smern Nov 15 '22 at 23:17

Ravi K Thapliyal · Answer 3 · 2013-10-05T02:34:10.793

19

You're almost there. Use the following regex (with multi-line option enabled)

\bObject Name:\s+(.*)$

The complete match would be

Object Name:   D:\ApacheTomcat\apache-tomcat-6.0.36\logs\localhost.2013-07-01.log

while the captured group one would contain

D:\ApacheTomcat\apache-tomcat-6.0.36\logs\localhost.2013-07-01.log

If you want to capture the file path directly use

(?m)(?<=\bObject Name:).*$

edited Oct 05 '13 at 02:34

answered Oct 05 '13 at 02:21

Ravi K Thapliyal

51,095
9
76
89

I want the complete match to be `D:\ApacheTomcat\apache-tomcat-6.0.36\logs\localhost.2013-07-01.log` can't i do that? – Chamara Keragala Oct 05 '13 at 02:32
1

@CasperNine Yes, you can. Updated the regex. – Ravi K Thapliyal Oct 05 '13 at 02:37
@hwnd yes thats correct. But how that actually works? what if need to match words which are in the line `Security ID:` – Chamara Keragala Oct 05 '13 at 02:39
@CasperNine, did you try `(?m)(?<=\bObject Name:).*$`? – Ravi K Thapliyal Oct 05 '13 at 02:43
@RaviThapliyal your updated regex keeps a blank space in front of the line. how do i avoid that? – Chamara Keragala Oct 05 '13 at 02:44
1

@CasperNine, I guess it's not possible for you to trim it but variable length look-behind is not supported with almost all the regex engines. You could use `(?m)(?<=\bObject Name:\s{4}).*$` but it would fail for others like `Security ID:` because the amount of whitespace varies. – Ravi K Thapliyal Oct 05 '13 at 02:47
@hwnd: that would fail if the file structure changes (re-ordered or the next token is dropped). – Ravi K Thapliyal Oct 05 '13 at 02:53
Yes, I saw that, I posted an answer on how he could do it. – hwnd Oct 05 '13 at 02:54
@RaviKThapliyal I need to extract "slprop: Information Analysis for Microsoft Office,Show Color next to Signal,Red" from https://pastebin.com/NRU4vJk6 . Please note there are line breaks. – Sujay Ghosh Apr 26 '23 at 09:59

score 18 · Answer 4 · edited Dec 11 '19 at 18:04

18

This might work out for you depending on which language you are using:

(?<=Object Name:).*

It's a positive lookbehind assertion. More information could be found here.

It won't work with JavaScript though. In your comment I read that you're using it for logstash. If you are using GROK parsing for logstash then it would work. You can verify it yourself here:

https://grokdebug.herokuapp.com/

edited Dec 11 '19 at 18:04

Peter Mortensen

30,738
21
105
131

answered Sep 20 '16 at 10:56

Himanshu Chauhan

812
9
11

score -4 · Answer 5 · edited Dec 11 '19 at 18:07

Here's a quick Perl script to get what you need. It needs some whitespace chomping.

#!/bin/perl

$sample = <<END;
Subject:
  Security ID:        S-1-5-21-3368353891-1012177287-890106238-22451
  Account Name:       ChamaraKer
  Account Domain:     JIC
  Logon ID:       0x1fffb

Object:
  Object Server:  Security
  Object Type:    File
  Object Name:    D:\\ApacheTomcat\\apache-tomcat-6.0.36\\logs\\localhost.2013- 07-01.log
  Handle ID:  0x11dc
END

my @sample_lines = split /\n/, $sample;
my $path;

foreach my $line (@sample_lines) {
  ($path) = $line =~ m/Object Name:([^s]+)/g;
  if($path) {
    print $path . "\n";
  }
}

score -4 · Answer 6 · edited Dec 11 '19 at 18:05

This is a Python solution.

import re

line ="""Subject:
    Security ID:        S-1-5-21-3368353891-1012177287-890106238-22451
    Account Name:       ChamaraKer
    Account Domain:     JIC
    Logon ID:       0x1fffb

Object:
    Object Server:  Security
    Object Type:    File
    Object Name:    D:\ApacheTomcat\apache-tomcat-6.0.36\logs\localhost.2013-07-01.log
    Handle ID:  0x11dc"""



regex = (r'Object Name:\s+(.*)')
match1= re.findall(regex,line)
print (match1)

*** Remote Interpreter Reinitialized  ***
>>> 
['D:\\ApacheTomcat\x07pache-tomcat-6.0.36\\logs\\localhost.2013-07-01.log']
>>>

Regex to get the words after matching string

6 Answers6

Linked

Related