1

I have a log file with a weird format that I would like to convert to a table. The format is that each line contains multiple keyvalue pairs (same pairs on each row). I want to convert these rows so that each property becomes a column in a table containing the value from the row.

Note that the original log file contains 39 properies on each row and the log file is about 80MB.

Example rows:

date=2019-12-02 srcip=8.8.8.8 destip=8.8.4.4 srcintf="port2"
date=2019-12-01 srcip=8.8.8.8 destip=8.8.4.4 srcintf="xyz abc"
date=2019-12-03 srcip=8.8.8.8 destip=8.8.4.4 srcintf="port2"
date=2019-12-05 srcip=8.8.8.8 destip=8.8.4.4 srcintf="port2"
date=2019-12-07 srcip=8.8.8.8 destip=8.8.4.4 srcintf="port2"

I have tried:

Get-Content .\testfile.log | select -First 10 | ConvertFrom-String | select p1, p2, p3 | ft | Format-Wide

But this will not break out the property name to the column name. So in this example i want P1 to be date, p2 srcip, and p3 destip and that the first part of each value is removed.

Po

Anyone have any tips or creative ideas how to convert this to a table?

Daniel Björk
  • 2,475
  • 1
  • 19
  • 26

4 Answers4

5

ConvertFrom-String provides separator-based parsing as well as heuristics-based parsing based on templates containing example values. The separator-based parsing applies automatic type conversions you cannot control, and the template language is poorly documented, with the exact behavior hard to predict - it's best to avoid this cmdlet altogether. Also note that it's not available in PowerShell [Core] v6+.

Instead, I suggest an approach based on the switch statement[1] and the -split operator to create a collection of custom objects ([pscustomobject]) representing the log lines:

# Use $objects = switch ... to capture the generated objects in a variable.
switch -File .\testfile.log { # Loop over all file lines
  default {
    $oht = [ordered] @{ } # Define an aux. ordered hashtable
    foreach ($keyValue in -split $_) { # Loop over key-value pairs
      $key, $value = $keyValue -split '=', 2 # Split pair into key and value
      $oht[$key] = $value -replace '^"|"$' # Add to hashtable with "..." removed
    }
    [pscustomobject] $oht  # Convert to custom object and output.
  }
}

Note:

  • The above assumes that your values have no embedded spaces; if they do, more work is needed - see next section.

  • To capture the generated custom objects in a variable, simply use $objects = switch ...

    • With two ore more log lines, $objects becomes an [object[]] array of [pscustomobject] instances. If you want to ensure that $objects also becomes an array even if there happens to be just one log line, use [array] $objects = switch ... ([array] is effectively the same as [object[]]).
  • To directly send the output objects through the pipeline to other cmdlets, enclose the switch statement in & { ... }

With your sample input, this yields:

date       srcip   destip  srcintf
----       -----   ------  -------
2019-12-02 8.8.8.8 8.8.4.4 port2
2019-12-01 8.8.8.8 8.8.4.4 port2
2019-12-03 8.8.8.8 8.8.4.4 port2
2019-12-05 8.8.8.8 8.8.4.4 port2
2019-12-07 8.8.8.8 8.8.4.4 port2

Variant with support for values with embedded spaces inside "..." (e.g., srcintf="port 2"):

switch -file .\testfile.log {
  default {
    $oht = [ordered] @{ }
    foreach ($keyValue in $_ -split '(\w+=(?:[^"][^ ]*|"[^"]*"))' -notmatch '^\s*$') {
      $key, $value = $keyValue -split '=', 2
      $oht[$key] = $value -replace '^"|"$'
    }
    [pscustomobject] $oht
  }
}

Note that there's no support for embedded escaped " instances (e.g, srcintf="port \"2\"" won't work).

Explanation:

  • $_ -split '(\w+=(?:[^"][^ ]*|"[^"]*"))' splits by a regex that matches key=valueWithoutSpaces and key="value that may have spaces" tokens and, by virtue of enclosing the expression in (...) (creating a capture group), includes these "separators" in the tokens that -split outputs (by default, separators aren't included).

  • -notmatch '^\s*$' then weeds out empty and all-spaces tokens from the result (the "data tokens", which aren't of interest in our case), leaving effectively just the key-value pairs.

  • $key, $value = $keyValue -split '=', 2 splits the given key-value token by = into at most 2 tokens, and uses a destructuring assignment to assign the key and the value to separate variables.

  • $oht[$key] = $value -replace '^"|"$' adds an entry to the aux. hashtable with the key and value at hand, where -replace '^"|"$' uses the -replace operator to remove " from the beginning and end of the value, if present.


[1] switch -File is a flexible and much faster alternative to processing a file line by line with a combination of Get-Content and ForEach-Object.

mklement0
  • 382,024
  • 64
  • 607
  • 775
1

So what you could do is cut each line into a hashtable of key value pairs passing those to ConvertFrom-StringData instead. There is a couple of caveats with this approach. In keeping it simple your source data is space delimited. This would break if you real data contained spaces (which can be mitigated.) Other obvious caveat is you can't guarantee property order.

Get-Content c:\temp\so.txt | ForEach-Object{
     [PSCustomObject](($_ -split " ") -join "`r`n" | ConvertFrom-StringData)
} | Select-Object date, srcip, destip, srcintf

Output:

date       srcip   destip  srcintf
----       -----   ------  -------
2019-12-02 8.8.8.8 8.8.4.4 "port2"
2019-12-01 8.8.8.8 8.8.4.4 "port2"
2019-12-03 8.8.8.8 8.8.4.4 "port2"
2019-12-05 8.8.8.8 8.8.4.4 "port2"
2019-12-07 8.8.8.8 8.8.4.4 "port2"
Matt
  • 45,022
  • 8
  • 78
  • 119
  • I tried this approach and compaired to the one mklement0 suggested and his approach is faster and since some of the srcintf contains space this one failed. But i didnt write that so thats on me. – Daniel Björk Dec 02 '19 at 13:38
1

OK, for the purposes of discussion, I am going to assume the following:

  1. The data is in a file PSDATA.TXT
  2. There are no spaces in the data other than the spaces separating the name-value pairs.
  3. It is acceptable for the resulting tabular data to treat all the values as strings.

Given that...

Get-Content -Path PSDATA.TXT |
ForEach-Object {$_ -replace ' ','";' -replace '=','="' -replace '""','"'} |
ForEach-Object {New-Object PSObject -Property (Invoke-Expression ("[Ordered]@{{{0}}}" -f $_))}

... will generate a table where each line in the file becomes a PSObject with fields taking their names from the name in each name-value pair, and the associated value being the value of the field, as a string. If you're not using PowerShell v4 or later (I'm not sure about 3), you can omit the [Ordered], with the side effect of the order of the fields in the PSObject not necessarily being in the same order as in the file.

If you wanted to have an array of these PSObjects for further processing, you could wrap the whole line above in a variable assignment, e.g., $A=(«that whole thing above, on one line»), and if you wanted to send it to a CSV file, you could just add | Export-CSV -path NewCSVFile.CSV to the end.

Jeff Zeitlin
  • 9,773
  • 2
  • 21
  • 33
0

I would prefer a datatable, so you easily can sort, filter, merge etc. the logfile:

$logFilePath  = 'C:\test\test.log'

$dt = New-Object system.Data.DataTable
[void]$dt.Columns.Add('P1',[string]::empty.GetType() )
[void]$dt.Columns.Add('P2',[string]::empty.GetType() )
[void]$dt.Columns.Add('P3',[string]::empty.GetType() )

foreach( $line in [System.IO.File]::ReadLines($logFilePath) )
{
    $tokenArray = $line -split '[= ]'

    $row    = $dt.NewRow()
    $row.P1 = $tokenArray[1]
    $row.P2 = $tokenArray[3]
    $row.P3 = $tokenArray[5]
    [void]$dt.Rows.Add( $row )

}

$dt
f6a4
  • 1,684
  • 1
  • 10
  • 13