1

Trying to extract some strings from a file. Here's a simplified example of the text in the file:

<modelName>thing1</modelName><gtin>123456789</gtin><description>blah blah blah</description>
<modelName>thing2</modelName><gtin>789456123</gtin><description>blah blah blah</description>
<modelName>thing3</modelName><gtin>456789123</gtin><description>blah blah blah</description>

I want to extract just this part of each line: <gtin>xxxxxxx</gtin> and put them into another file.

I do not want the whole line, just the gtin.

Here's what I tried:

Get-Content -Path C:\firstFile.xml -Readcount 1000 | foreach { $_ -match "<gtin1>*</gtin1>" } | out-file C:\gtins.txt

But as you can probably guess it's not working.

Any help is greatly appreciated. I have a feeling this is embarrassingly easy.

Thanks!

Brian
  • 81
  • 2
  • 8
  • I feel `Select-String` with `-AllMatches` switch may help you here https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/select-string?view=powershell-5.1, probably with Substring on the output to remove the extra text from each line. – Gareth Lyons Jan 25 '18 at 18:12

2 Answers2

2

(Edit: Ansgar Wiechers is right that you shouldn't parse XML using a regular expression, and that proper XML parsing is vastly to be preferred.)

You can extract substrings using Select-String and a regular expression. Example:

Get-Content "C:\firstfile.xml" | Select-String '(<gtin>.+</gtin>)' | ForEach-Object {
  $_.Matches[0].Groups[1].Value
}

If you want just the value between the tags, move the ( and ) to surround only the .+ portion of the expression.

More information about regular expressions:

PS C:\> help about_Regular_Expressions
Bill_Stewart
  • 22,916
  • 4
  • 51
  • 62
0

Do not parse XML with regular expressions.

Use an actual XML parser for extracting data from XML files.

[xml]$xml = Get-Content 'C:\firstfile.xml'
$xml.SelectNodes('//gtin') | Select-Object -Expand '#text'
Community
  • 1
  • 1
Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328