1

I'm trying to extract a certain block of lines from a text file, which contains something like this:

...
sCountry = "USA"
sCity = "New York"
sState = "New York"
...
sCountry = "USA"
sCity = "Los Angeles"
sState = "California"

where those three lines repeat throughout the text file; I just want to extract those lines of text, and put the data fields into a csv, so that I have something like

"USA","New York","New York"
"USA","Los Angeles","California"
...

So far I have this:

$inputPath = 'C:\folder\file.vbs'
$outputFile = 'C:\folder\extracted_data.csv'
$fileContent = [io.file]::ReadAllText($inputPath)

$regex = '(?sm)(s[A-Z][a-z]+ = "\w*"(\s*$)){3}'

$fileContent = $fileContent | Select-String $regex -AllMatches | % {$_.Matches} | % {$_.Value}
$fileContent = [regex]::Replace($fileContent, 'sCountry = ', '')
$fileContent = [regex]::Replace($fileContent, '(?sm)((^\s*)s[A-Z][a-z]+ = )', ',')
$fileContent > $outputFile

which I was able to obtain by looking at this:

Multiline regex to match config block.

However, my output file is empty when I run the script. It won't pattern-match with the $regex pattern I provided, but it WILL match on a single line if I do something like:

$regex = '(?sm)(sCountry = "\w*"(\s*$))'

but not if I do something like:

$regex = '(?sm)(s[A-Z][a-z]+ = "\w*"(\s*$))'

How do I make the pattern matching work across multiple lines?

Community
  • 1
  • 1
cerremony
  • 213
  • 2
  • 12

1 Answers1

1

Using you test data exactly as you have it in your post I took a different approach using Select-String and ConvertFrom-StringData. This has a minor flaw that could be overlooked (or addressed if you really need to). The caveat here is that the sCountry line has to occur first and the sState line has to occur last in the group.

$results = ((Get-Content C:\temp\test.txt -Raw) | 
    Select-String -Pattern "(?sm)sCountry.*?sState.*?$" -AllMatches).Matches.Value
$results | ForEach-Object{
    New-Object -TypeName PSCustomObject -Property ($_.Replace('"','') | ConvertFrom-StringData)
} | Export-CSV -NoTypeInformation C:\temp\output.csv

To get the data groups the regex here will grab everything from "sCountry" to then end of a line of the where the next "sState" occurs. This current logic would fail if there were other lines in between those besides the ones expected. We get rid of the variable quotes with simple .Replace('"',''). I think it is minor but the resulting headers have leading s's which is probably not a big deal.

The object in PowerShell looks like this before it is exported to CSV

sCity       sCountry sState    
-----       -------- ------    
New York    USA      New York  
Los Angeles USA      California

Which will net output like this in a csv

"sCity","sCountry","sState"
"New York","USA","New York"
"Los Angeles","USA","California"

The cool think is not you could sort the data before you exported it. Or anything really that you could do with PowerShell objects.

Matt
  • 45,022
  • 8
  • 78
  • 119