0

I have xml files formatted like this:

<User>
<FirstName>Foo Bar</FirstName>
<LastName>Blah</LastName>
<OtherStuff>...</OtherStuff>
<More>...</More>
<CompanyName>Foo</CompanyName>
<EmailAddress>bar@foo.com</EmailAddress>
</User>
<User>
...

I want to read through all xml files, creating as output <FirstName>,<CompanyName>,<EmailAddress>, so:

Foo Bar,Foo,bar@foo.com
Name,User2,user@email.com
FSds,Blah,blah@blah.com

I am using the following regex

(?si)<FirstName>(.*?)</FirstName>.*?<CompanyName>(.*?)</CompanyName>\s*<EmailAddress>(.*?)</EmailAddress>'

However, this returns also everything from the tags between FirstName and CompanyName

What am I doing wrong?

Pr0no
  • 3,910
  • 21
  • 74
  • 121

2 Answers2

4

Why not use XML processing?

C:\PS> $xml = [xml]@'
>>> <Users>
>>> <User>
>>> <FirstName>Foo Bar</FirstName>
>>> <LastName>Blah</LastName>
>>> <OtherStuff>...</OtherStuff>
>>> <More>...</More>
>>> <CompanyName>Foo</CompanyName>
>>> <EmailAddress>bar@foo.com</EmailAddress>
>>> </User>
>>> </Users>
>>> '@
C:\PS> "$($xml.Users.User.FirstName), $($xml.Users.User.CompanyName), $($xml.Users.User.EmailAddress)"
Foo Bar, Foo, bar@foo.com

You haven't shown the full XML document so I'm guessing on the top level nodes. You will need to adjust based on the structure of your XML doc.

Keith Hill
  • 194,368
  • 42
  • 353
  • 369
0

I find multi-line regex can be easier if you build it in a here-string:

$String = @'
<User>
<FirstName>Foo Bar</FirstName>
<LastName>Blah</LastName>
<OtherStuff>...</OtherStuff>
<More>...</More>
<CompanyName>Foo</CompanyName>
<EmailAddress>bar@foo.com</EmailAddress>
</User>
'@

$regex = @'
(?ms).+?<FirstName>(.+?)</FirstName>.*?
<CompanyName>(.+?)</CompanyName>.*?
<EmailAddress>(.+?)</EmailAddress>.+?
'@

$string -match $regex > $null
$matches[1..3] -join ','



Foo Bar,Foo,bar@foo.com

If it's a big file and you don't want to read it all in at once, you can use the closing tag as a delimiter:

Get-Content xmlfile.xml -Delimiter '</User>' |
 foreach {
  if ($_ -match $regex)
   {$matches[1..3] -join ','
   }
mjolinor
  • 66,130
  • 7
  • 114
  • 135