1

I have an xml file where i have line some

<!--<__AMAZONSITE id="-123456780" instance ="CATZ00124"__/>-->

and i need the id and instance values from that particular line.

where i need have -123456780 as well as CATZ00124 in 2 different variables.

Below is the sample code which i have tried

$xmlfile = 'D:\Test\sample.xml'
$find_string = '__AMAZONSITE'
$array = @((Get-Content $xmlfile) | select-string $find_string)

Write-Host $array.Length

foreach ($commentedline in $array)
{   
   Write-Host $commentedline.Line.Split('id=')   
}

I am getting below result:

<!--<__AMAZONSITE 


"-123456780" 
nstance 
"CATZ00124"__/>
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • Your question appears incomplete and with a number of formatting issue. Could you please have a look into it and improve it a bit? Also, please provide a sample of file Sample.xml to test scripts on. – Daemon Painter Sep 13 '19 at 12:14
  • 1
    The `.split()` method takes every single letter to split, you may want the RegEx based `-split 'id='` operator. But aside from better handling XML files with XML tools you could use a RegEx with capture groups to grep id and instance. –  Sep 13 '19 at 12:22

2 Answers2

2

The preferred way still is to use XML tools for XML files.

As long a line with AMAZONSITE and instance is unique in the file this could do:

## Q:\Test\2019\09\13\SO_57923292.ps1

$xmlfile = 'D:\Test\sample.xml' # '.\sample.xml' #

## see following RegEx live and with explanation on https://regex101.com/r/w34ieh/1
$RE = '(?<=AMAZONSITE id=")(?<id>[\d-]+)" instance ="(?<instance>[^"]+)"'

if((Get-Content $xmlfile -raw) -match $RE){
    $AmazonSiteID = $Matches.id
    $Instance     = $Matches.instance
}
  • Hi @LotPings, Couldnt understand how the regex has to be defined. When through the Regex101.com URL , But couldn't understand how this has to be defined. – user1539205 Sep 17 '19 at 09:01
1

LotPings' answer sensibly recommends using a regular expression with capture groups to extract the substrings of interest from each matching line.

You can incorporate that into your Select-String call for a single-pipeline solution (the assumption is that the XML comments of interest are all on a single line each):

# Define the regex to use with Select-String, which both
# matches the lines of interest and captures the substrings of interest 
# ('id' an 'instance' attributes) via capture groups, (...)
$regex = '<!--<__AMAZONSITE id="(.+?)" instance ="(.+?)"__/>-->'

Select-String -LiteralPath $xmlfile -Pattern $regex | ForEach-Object {
    # Output a custom object with properties reflecting
    # the substrings of interest reported by the capture groups.
    [pscustomobject] @{
        id = $_.Matches.Groups[1].Value
        instance = $_.Matches.Groups[2].Value
    }
}

The result is an array of custom objects that each have an .id and .instance property with the values of interest (which is preferable to setting individual variables); in the console, the output would look something like this:

id         instance
--         --------
-123456780 CATZ00124
-123456781 CATZ00125
-123456782 CATZ00126


As for what you tried:

Note: I'm discussing your use of .Split(), though for extracting a substring, as is your intent, .Split() is not the best tool, given that it is only the first step toward isolating the substring of interest.

As LotPings notes in a comment, in Windows PowerShell, $commentedline.Line.Split('id=') causes the String.Split() method to split the input string by any of the individual characters in split string 'id=', because the method overload that Windows PowerShell selects takes a char[] value, i.e. an array of characters, which is not your intent.

You could rectify this as follows, by forcing use of the overload that accepts string[] (even though you're only passing one string), which also requires passing an options argument:

$commentedline.Line.Split([string[] 'id=', 'None') # OK, splits by whole string

Note that in PowerShell Core the logic is reversed, because .NET Core introduced a new overload with just [string] (with an optional options argument), which PowerShell Core selects by default. Conversely, this means that if you do want by-any-character splitting in PowerShell Core, you must cast the split string to [char[]].

On a general note, PowerShell has the -split operator, which is regex-based and offers much more flexibility than String.Split() - see this answer.

Applied to your case:

$commentedline.Line -split 'id='
  • While id= is interpreted a regex by -split, that makes no difference here, given that the string contains no regex metacharacters (characters with special meaning); if you do want to safely split by a literal substring, use [regex]::Escape('...') as the RHS.

  • Note that -split is case-insensitive by default, as PowerShell generally is; however, you can use the -csplit variant for case-sensitive matching.

mklement0
  • 382,024
  • 64
  • 607
  • 775