5

I have been trying to extract certain values from multiple lines inside a .txt file with PowerShell.

Host
Class
INCLUDE vmware:/?filter=Displayname Equal "server01" OR Displayname Equal "server02" OR Displayname Equal "server03 test"

This is what I want :

server01
server02
server03 test

I have code so far :

$Regex = [Regex]::new("(?<=Equal)(.*)(?=OR")           
$Match = $Regex.Match($String)
Arbelac
  • 1,698
  • 6
  • 37
  • 90
  • 1
    check out [print only text between quotes](https://stackoverflow.com/questions/14568218/powershell-print-only-text-between-quotes) – Niveditha S Feb 09 '19 at 13:45

4 Answers4

4

You may use

[regex]::matches($String, '(?<=Equal\s*")[^"]+')

See the regex demo.

See more ways to extract multiple matches here. However, you main problem is the regex pattern. The (?<=Equal\s*")[^"]+ pattern matches:

  • (?<=Equal\s*") - a location preceded with Equal and 0+ whitespaces and then a "
  • [^"]+ - consumes 1+ chars other than double quotation mark.

Demo:

$String = "Host`nClass`nINCLUDE vmware:/?filter=Displayname Equal ""server01"" OR Displayname Equal ""server02"" OR Displayname Equal ""server03 test"""
[regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value}

Output:

server01
server02
server03 test

Here is a full snippet reading the file in, getting all matches and saving to file:

$newfile = 'file.txt'
$file = 'newtext.txt'
$regex = '(?<=Equal\s*")[^"]+'
Get-Content $file | 
     Select-String $regex -AllMatches | 
     Select-Object -Expand Matches | 
     ForEach-Object { $_.Value } |
     Set-Content $newfile
Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
  • FYI, if the matches may span across multiple lines, replace `Get-Content $file |` with `Get-Content $file | Out-String |`, or, if you're using PowerShell v3 or newer, `Get-Content $file -Raw |` – Wiktor Stribiżew Feb 10 '19 at 11:33
  • if there are multiple `INCLUDE vmware:/?filter`What happened ? I have opened new question. https://stackoverflow.com/questions/54646160/multiple-lines-parsing-via-regex – Arbelac Feb 12 '19 at 08:56
  • @Arbelac Could you please explain the issue? – Wiktor Stribiżew Feb 12 '19 at 09:15
  • @Arbelac Try ``$regex.matches( $String.Split("`r`n").where({$_.contains('"')})[0] ).groups.where{$_.name -eq 1}.value | sc "c:\temp\result.txt"`` or, change the line selecting condition to a safer one, `.where({$_ -match '"[^"]+"'})` – Wiktor Stribiżew Feb 12 '19 at 09:36
2

You can modify your regex to use a capture group, which is indicated by the parentheses. The backslashes just escape the quotes. This allows you to just capture what you are looking for and then filter it further. The capture group here is automatically named 1 since I didn't provide a name. Capture group 0 is the entire match including quotes. I switched to the Matches method because that encompasses all matches for the string whereas Match only captures the first match.

$regex = [regex]'\"(.*?)\"'    
$regex.matches($string).groups.where{$_.name -eq 1}.value

If you want to export the results, you can do the following:

$regex = [regex]'\"(.*?)\"'    
$regex.matches($string).groups.where{$_.name -eq 1}.value | sc "c:\temp\export.txt"
AdminOfThings
  • 23,946
  • 4
  • 17
  • 27
  • 1
    +1 for a nice generalization. Quibbles: No need to `\ `-escape `"`. Syntax `.where{...}` definitely works, but to me the more verbose form `.where({...})` is preferable for conceptual reasons, so that no one is tempted to use `.where {...}` (note the space), which breaks. As an aside: alias `sc` for `Set-Content` was, for better or worse, removed from PowerShell _Core_. – mklement0 Feb 09 '19 at 18:38
2

Another option (PSv3+), combining [regex]::Matches() with the -replace operator for a concise solution:

$str = @'
Host
Class
INCLUDE vmware:/?filter=Displayname Equal "server01" OR Displayname Equal "server02" OR Displayname Equal "server03 test"
'@ 

[regex]::Matches($str, '".*?"').Value -replace '"'

Regex ".*?" matches all "..."-enclosed tokens; .Value extracts them, and -replace '"' strips the " chars.

It may be not be obvious, but this happens to be the fastest solution among the answers here, based on my tests - see bottom.


As an aside: The above would be even more PowerShell-idiomatic if the -match operator - which only looks for a (one) match - had a variant named, say, -matchall, so that one could write:

# WISHFUL THINKING (as of PowerShell Core 6.2)
$str -matchall '".*?"' -replace '"'

See this feature suggestion on GitHub.


Optional reading: performance comparison

Pragmatically speaking, all solutions here are helpful and may be fast enough, but there may be situations where performance must be optimized.

Generally, using Select-String (and the pipeline in general) comes with a performance penalty - while offering elegance and memory-efficient streaming processing.

Also, repeated invocation of script blocks (e.g., { $_.Value }) tends to be slow - especially in a pipeline with ForEach-Object or Where-Object, but also - to a lesser degree - with the .ForEach() and .Where() collection methods (PSv4+).

In the realm of regexes, you pay a performance penalty for variable-length look-behind expressions (e.g. (?<=EQUAL\s*")) and the use of capture groups (e.g., (.*?)).

Here is a performance comparison using the Time-Command function, averaging 1000 runs:

Time-Command -Count 1e3 { [regex]::Matches($str, '".*?"').Value -replace '"' },
   { [regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value} },
   { [regex]::Matches($str, '\"(.*?)\"').Groups.Where({$_.name -eq '1'}).Value },
   { $str | Select-String -Pattern '(?<=Equal\s*")[^"]+' -AllMatches | ForEach-Object{$_.Matches.Value} } |
     Format-Table Factor, Command

Sample timings from my MacBook Pro; the exact times aren't important (you can remove the Format-Table call to see them), but the relative performance is reflected in the Factor column, from fastest to slowest.

Factor Command
------ -------
1.00   [regex]::Matches($str, '".*?"').Value -replace '"' # this answer
2.85   [regex]::Matches($str, '\"(.*?)\"').Groups.Where({$_.name -eq '1'}).Value # AdminOfThings'
6.07   [regex]::matches($String, '(?<=Equal\s*")[^"]+') | Foreach {$_.Value} # Wiktor's
8.35   $str | Select-String -Pattern '(?<=Equal\s*")[^"]+' -AllMatches | ForEach-Object{$_.Matches.Value} # LotPings'
mklement0
  • 382,024
  • 64
  • 607
  • 775
1

An alterative reading the file directly with Select-String using Wiktor's good RegEx:

Select-String -Path .\file.txt -Pattern '(?<=Equal\s*")[^"]+' -AllMatches|
    ForEach-Object{$_.Matches.Value} | Set-Content NewFile.txt

Sample output:

> Get-Content .\NewFile.txt
server01
server02
server03 test