2

I am trying to extra multiple points of data (First, Last, ID number) from a rather nasty log file.

I have this:

Get-Content c:\LOG\22JAN01.log | Out-String | 
  % {[Regex]::Matches($_, "(?<=FIRST:)((.|\n)*?)(?=LAST:)")} | % {$_.Value}

Which does a fine job of extracting the first name - but I need to also get the last name and ID number from the same line and present them together "BOB SMITH 123456"

Each line of the log file looks like this:

FIRST:BOB LAST:SMITH DOOR:MAIN ENTRANCE ID:123456 TIME:Friday, December 31, 2021 11:55:47 PM INCIDENT:19002304

I would like the output to look something like:

  • BOB SMITH 123456
  • JACK JONES 029506
  • KAREN KARPENTER 6890298

So far I can only manage to get all the first names and nothing else. Thanks for any help pointing me in the right direction!

mklement0
  • 382,024
  • 64
  • 607
  • 775
Derek B
  • 61
  • 6

5 Answers5

6

If they are always on the same line, I like to use switch to read it.

switch -Regex -File c:\LOG\22JAN01.log {
    'FIRST:(\w+) LAST:(.+) DOOR.+ ID:(\d+) ' {
        [PSCustomObject]@{
            First = $matches[1]
            Last  = $matches[2]
            ID    = $matches[3]
        }
    }
}

Sample log output

First Last      ID     
----- ----      --     
BOB   SMITH     123456 
JACK  JONES     029506 
KAREN KARPENTER 6890298

You can capture it to a variable and then continue using the objects however you like.

$output = switch -Regex -File c:\LOG\22JAN01.log {
    'FIRST:(\w+) LAST:(.+) DOOR.+ ID:(\d+) ' {
        [PSCustomObject]@{
            First = $matches[1]
            Last  = $matches[2]
            ID    = $matches[3]
        }
    }
}

$output | Out-GridView

$output | Export-Csv -Path c:\Log\parsed_log.log -NoTypeInformation
Doug Maurer
  • 8,090
  • 3
  • 12
  • 13
2

You need to use capture groups ().

Assuming that FIRST is always right at the start of the line (remove the ^ if not), and that the field names are always present and in the same order, and that their values are at least one character long, you could use, for example:

$result = & {
  $path = "c:\LOG\22JAN01.log";
  $pattern = "^FIRST:(.+?) LAST:(.+?) DOOR:.+? ID:(\d+)";
  Select-String -Path $path -Pattern $pattern -AllMatches |
  % {$_.Matches.Groups[1], $_.Matches.Groups[2], $_.Matches.Groups[3] -join " "}
}

.+? means match one or more of any character except newlines, as few times as possible before what follows in the pattern can be matched. Something more restrictive such as [A-Z]+ can be used instead if that will definitely match the required values.

MikeM
  • 13,156
  • 2
  • 34
  • 47
2

If you can make the assumption that each field name is composed of (English) letters only,[1] such as FIRST, a generic solution that combines the -replace operator with the ConvertFrom-StringData cmdlet is possible:

# Sample array of input lines.
$inputLines = 
  'FIRST:BOB LAST:SMITH DOOR:MAIN ENTRANCE ID:123456 TIME:Friday, December 31, 2021 11:55:47 PM INCIDENT:19002304',
  'FIRST:JACK LAST:JONES DOOR:SIDE ENTRANCE ID:123457 TIME:Friday, December 31, 2021 11:55:48 PM INCIDENT:19002305',
  'FIRST:KAREN LAST:KARPENTER DOOR:BACK ENTRANCE ID:123458 TIME:Friday, December 31, 2021 11:55:49 PM INCIDENT:19002306'

$inputLines -replace '\b([a-z]+):', "`n`$1=" | 
  ConvertFrom-StringData |
    ForEach-Object { $_.FIRST, $_.LAST, $_.ID -join ' ' }
  • For each input line, the -replace operation places each field name-value pair onto its own line, replacing the separator, :, with =.

  • The resulting block of lines is parsed by ConvertFrom-StringData into a hashtable representing the fields of each input line, allowing convenient access to the fields by name, e.g. .FIRST (PowerShell allows you to use property-access syntax as an alternative to index syntax, s.g. ['FIRST']).

Output:

BOB SMITH 123456
JACK JONES 123457
KAREN KARPENTER 123458

[1] More generally, you can use this approach as long as you can formulate a regex that unambiguously identifies a field name.

mklement0
  • 382,024
  • 64
  • 607
  • 775
1

Assuming the log file looks literally as what we see in the quoted text you could match it like this:

$log = @'
FIRST:BOB LAST:SMITH DOOR:MAIN ENTRANCE ID:123456 TIME:Friday, December 31, 2021 11:55:47 PM INCIDENT:19002304
FIRST:JOHN LAST:DOE DOOR:MAIN ENTRANCE ID:789101 TIME:Friday, December 31, 2021 11:55:47 PM INCIDENT:19002304
'@

$re = [regex]'(?si)FIRST:(?<first>.*?)\s*LAST:(?<last>.*?)\s*DOOR.*?ID:(?<id>\d+)'

foreach($match in $re.Matches($log))
{
    '{0} {1} {2}' -f
        $match.Groups['first'].Value,
        $match.Groups['last'].Value,
        $match.Groups['id'].Value
}

# Results in:
BOB SMITH 123456
JOHN DOE 789101

This regex should work on a multi-line string so you would use -Raw for Get-Content:

$re = [regex]'(?si)FIRST:(?<first>.*?)\s*LAST:(?<last>.*?)\s*DOOR.*?ID:(?<id>\d+)'

$result = foreach($match in $re.Matches((Get-Content ./test.log -Raw)))
{
    [pscustomobject]@{
        First = $match.Groups['first'].Value
        Last  = $match.Groups['last'].Value
        ID    = $match.Groups['id'].Value
    }
}

$result | Export-Csv path/to/newlog.csv -NoTypeInformation

See https://regex101.com/r/WluWpD/1 for the regex explanation.

Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
  • 1
    The value of `DOOR` in the example is `MAIN ENTRANCE`, so `ENTRANCE` should not be in your regex as other doors may not include it. Also, the `.*?` at the end is pointless. Looks good otherwise. – MikeM Jan 01 '22 at 19:26
  • @MikeM I'm dumb, thank you! for both things (`.*?` too). Learning regex is hard hehe – Santiago Squarzon Jan 01 '22 at 19:35
  • 1
    So far this seems to work! At least it displays what I need. Now just to figure out how to write it to a file and I'll be all set I think! – Derek B Jan 01 '22 at 19:54
  • @DerekB see my last edit, you just need to collect the results of the `foreach` loop in a variable (`$result`) and then `Out-File` the results. – Santiago Squarzon Jan 01 '22 at 19:56
  • Hmm - I added the result line and I get an empty file. I am assuming I'm missing something here. – Derek B Jan 01 '22 at 20:12
  • @DerekB are you sure `(Get-Content ./test.log -Raw)` is pointing to your file?, use the absolute path if in doubt. Also make sure `$result =` is there, before the `foreach`. – Santiago Squarzon Jan 01 '22 at 20:17
  • 1
    Ah yes I was missing the one $result = line - now it exports to a file - thank you! – Derek B Jan 01 '22 at 20:22
  • One last question - is there any easy way to export this solution to a comma delimited CSV file? – Derek B Jan 01 '22 at 20:44
  • @DerekB Yup, I have updated the answer. Doug's answer was covering the export to CSV too – Santiago Squarzon Jan 01 '22 at 20:48
  • @SantiagoSquarzon when I export it as a CSV I end up with one column with the character count titled "length" - what I really need is 3 columns (first name, last name, ID) with that data in it. – Derek B Jan 01 '22 at 21:07
  • @DerekB the code as is works perfectly fine for me, you might not be copying as literal as I have posted. – Santiago Squarzon Jan 01 '22 at 21:15
  • @SantiagoSquarzon Yes, you were right - one silly mistake was screwing the whole thing up, works perfectly now. Mission accomplished, thank you! – Derek B Jan 01 '22 at 21:22
1

Using this reusable function:
(See also: #16257 String >>>Regex>>> PSCustomObject)

function ConvertFrom-Text {
    [CmdletBinding()]Param (
        [Regex]$Pattern,
        [Parameter(Mandatory = $True, ValueFromPipeLine = $True)]$InputObject
    )
    process {
        if ($_ -match $pattern) {
            $matches.Remove(0)
            [PSCustomObject]$matches
        }
    }
}

$log = @(
    'FIRST:BOB LAST:SMITH DOOR:MAIN ENTRANCE ID:123456 TIME:Friday, December 31, 2021 11:55:47 PM INCIDENT:19002304'
    'FIRST:JOHN LAST:DOE DOOR:MAIN ENTRANCE ID:789101 TIME:Friday, December 31, 2021 11:55:47 PM INCIDENT:19002304'
)

$Log |ConvertFrom-Text -Pattern '\bFIRST:(?<First>\S*).*\bLAST:(?<Last>\S*).*\bID:(?<ID>\d+)'

ID     Last  First
--     ----  -----
123456 SMITH BOB
789101 DOE   JOHN
iRon
  • 20,463
  • 10
  • 53
  • 79
  • 1
    A neat function, but if you're using capture groups there's no need for lookbehinds: `(?<=FIRST:)` would be better as just `FIRST:` etc. – MikeM Jan 01 '22 at 20:40