3

Regex will be the death of me. I am parsing logs from an enterprise password manager. This is what a handful of the logs look like:

date_time=2017-01-27 23:17:39 user=John Doe (86) ip_address=10.10.44.131 origin=web action=export password=CSDEV - SQL Account #20 (496) project=Applications (2)
date_time=2017-01-30 18:21:49 user=John Doe (86) ip_address=10.10.44.131 origin=web action=view_passwords_list additional=Active Passwords
date_time=2017-01-27 23:29:06 user=John Doe (86) ip_address=10.10.44.131 origin=web action=add_password password=Non-ACS Devices (1099) project=Infrastructure & Operations (31) additional=Import

Every single line in the log starts with five tags: date_time, user, ip_address, origin, and action. Afterwards, though, there can be up to three additional tags: "password", "project", and "additional".

These extra tags are what are doing me in. I need to be able to capture all that are available. Right now I have:

date_time=(.+) user=(.+) ip_address=(.+) origin=(.+) action=(.+) (password=(.+)|project=(.+)|additional=(.+))+

Based on regex101 this is close but doesn't quite get there.

https://regex101.com/r/eA2eE1/4

My guess is the final leap has to do with greedy vs lazy but I've hit the ends of my regex knowledge for the moment.

Thanks for any help you can provide!

Tchotchke
  • 399
  • 1
  • 2
  • 18
  • 3
    Try [`^date_time=([\d-]+ [\d:]+) user=(.+?) ip_address=([\d.]+) origin=(.+?) action=(.+?)(?: password=((?:(?!\w+=).)*))?(?: project=((?:(?!\w+=).)*))?(?: additional=(.+?))?$`](https://regex101.com/r/NAQrpr/2) – Wiktor Stribiżew Jan 30 '17 at 21:35
  • Perfect! I understand most of it (I really like the [\d.]+ for an IP address!), but could you briefly explain the end? (?: password=((?:(?!\w+=).)*))?(?: project=((?:(?!\w+=).)*))?(?: additional=(.+?))?$ – Tchotchke Jan 30 '17 at 22:03
  • Are they always IPV4? – LeonardChallis Jan 30 '17 at 22:12
  • My organization has no plans to move to IPv6 for the foreseeable future, so yes. – Tchotchke Jan 30 '17 at 22:30

5 Answers5

2

You may use

^date_time=([\d-]+ [\d:]+) user=(.+?) ip_address=([\d.]+) origin=(.+?) action=(.+?)(?: password=((?:(?!\w+=).)*))?(?: project=((?:(?!\w+=).)*))?(?: additional=(.+?))?$

See the regex demo.

Details:

  • ^ - start of string
  • date_time= - literal char sequence
  • ([\d-]+ [\d:]+) - Group 1: one or more digits or -, space, and 1+ digits or :
  • user= - literal char sequence
  • (.+?) - Group 2: any 1+ chars as few as possible
  • ip_address= - literal char sequence
  • ([\d.]+) - Group 3: one or more digits or .
  • origin= - literal char sequence
  • (.+?) - Group 4: any 1+ chars as few as possible
  • action= - literal char sequence
  • (.+?) - Group 5: any 1+ chars as few as possible
  • (?: password=((?:(?!\w+=).)*))? - an optional group matching a sequence of:
    • password= - literal char sequence
    • ((?:(?!\w+=).)*) - a tempered greedy token matching 0 or more occurrences of any char that is not a starting sequence for 1+ word chars followed with =
  • (?: project=((?:(?!\w+=).)*))? - similar to above
  • (?: additional=(.+?))? - similar to above, the tempered greedy token is replaced with .+? to match any 1+ chars, as few as possible
  • $ - end of string.
Community
  • 1
  • 1
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
1

Why not just split it on key/value pairs? Should be much easier, and better for future adaptability. Also will be easier on the Regex engine, and easier to read. Always the simpler the better.

(\w+=)

You can test it on Rextester or Regex101

maraaaaaaaa
  • 7,749
  • 2
  • 22
  • 37
  • I like the simplicity. When I have a moment I'll have to go back and try this. I tried something similar but ran into issues with the optional keys. – Tchotchke Jan 30 '17 at 22:35
0

No fancy regex but this would work:

$date,$user,$ipaddress,$origin,$action,$password,$project,$additional =
"YourString" -replace "date_time=" -split "user=|ip_address=|origin=|action=|password=|project="

Your variables are ready to use. If optional tags don't exist then they are set to $Null.

Bruno
  • 5,772
  • 1
  • 26
  • 43
0

Ok, I'm going to go a slightly different direction here... First I'll setup the input text:

$Text = @"
date_time=2017-01-27 23:17:39 user=John Doe (86) ip_address=10.10.44.131 origin=web action=export password=CSDEV - SQL Account #20 (496) project=Applications (2)
date_time=2017-01-30 18:21:49 user=John Doe (86) ip_address=10.10.44.131 origin=web action=view_passwords_list additional=Active Passwords
date_time=2017-01-27 23:29:06 user=John Doe (86) ip_address=10.10.44.131 origin=web action=add_password password=Non-ACS Devices (1099) project=Infrastructure & Operations (31) additional=Import
"@ -split "[\r\n]+"|?{$_}

Ok, so now I basically have your text as if I had done a Get-Content on your file. Next, for each line we'll make a blank [PSCustomObject] that contains each possible property. Then we'll split each line up into chunks of Something=A Value, and then for each of those bits split on the '=' and set that property on the object. Lastly we output the object.

$Text |%{
$curObj = new-object psobject -Property @{
date_time=''
user=''
ip_address=''
origin=''
action='' 
password=''
project=''
additional=''
}
$_ -split "(\S+=.+?)(?=(?:\S+=|$))"|?{$_}|%{$curObj.$($_.Split('=')[0]) = $_.Split('=')[1]}
$curObj
}

From there you could pipe it to Export-CSV or capture the results in an array, or do whatever you want with them. I piped it to Format-Table and got:

date_time            origin action               ip_address    user           project                           additional       password                      
---------            ------ ------               ----------    ----           -------                           ----------       --------                      
2017-01-27 23:17:39  web    export               10.10.44.131  John Doe (86)  Applications (2)                                   CSDEV - SQL Account #20 (496) 
2017-01-30 18:21:49  web    view_passwords_list  10.10.44.131  John Doe (86)                                    Active Passwords                               
2017-01-27 23:29:06  web    add_password         10.10.44.131  John Doe (86)  Infrastructure & Operations (31)  Import           Non-ACS Devices (1099)        
TheMadTechnician
  • 34,906
  • 3
  • 42
  • 56
0

Use the built-in ConvertFrom-StringData cmdlet.

$array = Get-Content -literal 'c:\data.log' |
    ForEach { $_ -replace '\s+(?=\w+=)', "`n" | ConvertFrom-StringData }

This command outputs an array of hashtables where each element corresponds to a line from the log, key=value pairs are automatically created as properties of each hashtable.

Notes:

  • Splitting by \s+(?=\w+=) means we split at any whitespace followed by a key name.
  • If the log file is big, use [IO.StreamReader]:

    $reader = [IO.StreamReader]'c:\data.log'
    $array = while (!$reader.EndOfStream) {
        $reader.ReadLine() -replace '\s+(?=\w+=)', "`n" | ConvertFrom-StringData
    }
    
  • To output CSV-compatible objects, typecast to PSCustomObject (PowerShell 3+) or PSObject.

wOxxOm
  • 65,848
  • 11
  • 132
  • 136