-1

I'm trying to parse some Microsoft logging information. The logs come as big blobs of supposedly "human readable" text, examples of which can be seen at the Windows Security Blog, and there is a specific event that I want to exclude from my analysis, namely "An operation was performed on an object" when the object in question is a groupPolicyContainer.

Here's my regular expression and test code:

my $re = qr/(?ms)EventCode=(4662)[^\d].*Object Type:\s*((?!groupPolicyContainer)\S)*/;
if ($sample1 =~ $re) { print "Matches -- should not have\n"; }
if ($sample2 =~ $re) { print "Matches -- and should have!\n"; }

$sample1 contains the phrase Object Type: groupPolicyContainer and $sample2 contains the phrase Object Type: Key. (They both have the same EventCode; this is a contrived test case.) If you look at the link, you can see that there's a lot of text surrounding the two key phrases, "EventCode" and "Object Type". "Object Type" does not occur more than once per log entry (in my contrived test case).

The regular expression says: both match. My expectation is that the first should not match, since it contains the negated phrase! I attempted to implement the code shown in a previous Stack Overflow response, and it doesn't seem to be working; the only difference between that example and mine is that mine operates on a multi-line document.

I've tried every possible combination of (?ms) I could think of! Is there something special I have to do to make this work in a multi-line document?

Community
  • 1
  • 1
Elf Sternberg
  • 16,129
  • 6
  • 60
  • 68

1 Answers1

1

Personally - I think you're fixating on a single regex approach a little too much. I would suggest that instead - parse the object into a hash, then test the relevant keys of the hash.

The problem with regex is that it tries hard to match. If it fails, it back tracks and looks for other potential match points. So in a multi line, might skip to the next record looking for a chunk that does match, especially if you have multi line greedy matching.

You can see what is happening with

use re 'debug';

Which will show you what the regex engine is doing.

But generally I would suggest that given you have perl, trying to make a winning regex is needlessly painful.

I know it's not quite what you asked, but hopefully this illustrates what I mean

#!/usr/bin/env perl 
use strict;
use warnings;

use Data::Dumper;

local $/; #set this to your record separator, and you can
          #use this with a while loop too!
my %this_object = <DATA> =~ m/^\s*(.*): (.*)$/gm;
print Dumper \%this_object;

if (    $this_object{'Handle ID'} eq '0x178'
    and $this_object{'Object Type'} eq 'File' )
{
    print "Matches this criteria\n";
}

__DATA__
Subject:
  Security ID: LB\administrator
  Account Name: administrator
  Account Domain: LB
  Logon ID: 0x3DE02

Object:
  Object Server: Security
  Object Type: File
  Object Name: C:\asdf\New Text Document.txt
  Handle ID: 0x178
  Resource Attributes: S:AI


Process Information:
  Process ID: 0x113c
  Process Name: C:\Windows\System32\notepad.exe

Access Request Information:
  Accesses: WriteData (or AddFile)
    AppendData (or AddSubdirectory or CreatePipeInstance)
    Access Mask: 0x6

This prints:

$VAR1 = {
          'Logon ID' => '0x3DE02',
          'Process ID' => '0x113c',
          'Process Name' => 'C:\\Windows\\System32\\notepad.exe',
          'Resource Attributes' => 'S:AI',
          'Account Domain' => 'LB',
          'Accesses' => 'WriteData (or AddFile)',
          'Security ID' => 'LB\\administrator',
          'Access Mask' => '0x6',
          'Object Type' => 'File',
          'Object Name' => 'C:\\asdf\\New Text Document.txt',
          'Object Server' => 'Security',
          'Account Name' => 'administrator',
          'Handle ID' => '0x178'
        };
Matches this criteria

But if that's 'too overkill' then how about this instead?:

if (    $thing =~ m/EventCode: 4666/
    and not $thing =~ m/groupPolicyContainer/ ) {
    print "Matches this criteria\n";
}

Saves having to figure out negative regex matching, and is probably more efficient too, because it'll not need to backtrack.

Sobrique
  • 52,974
  • 7
  • 60
  • 101
  • While I appreciate that, this is WAY overkill. All I want is a prefilter; given a collection of documents with the phrase "EventCode=4662", I want to match every one *except* those that contain the phrase "Object Type: groupPolicyContainer". It doesn't seem like that should be a hard test. – Elf Sternberg Oct 02 '15 at 17:23
  • Actually, I would call a really complex regex the overkill. Just because it looks concise, doesn't mean it isn't doing a lot of work to match. But as you will. Two regexs is probably an alternative solution. But also is hard to answer without some sample input. – Sobrique Oct 02 '15 at 17:52