0

I have a need to first test an XML file for duplicate element attributes, i.e. elements with the same attribute values. Then I need to compare those same attributes to another array.

Given XML of

<Definitions>
    <Sets>
        <Set id='S1' />
    </Sets>
    <Package>
        <Package id='P1' />
        <Package id='P2' />
    </Package>
</Definitions>

I can use

$a = $xml.SelectNodes('//Package').id

to build an array of ids. And I found this thread with an answer from Dale Qiao that looks like it should work. Namely

$a=@(1,2,3,1,2)
$b=$a | select –unique
Compare-object –referenceobject $b –differenceobject $a

But when I revise that for my scenario like this

$a = @($invalidXML.SelectNodes('//Package').id)
$b=$a | select –unique
Compare-object –referenceobject $b –differenceobject $a

I get an error Compare-Object : Cannot bind argument to parameter 'DifferenceObject' because it is null. But $a is not empty, it is the list of ids as expected.

So, first off what is going wrong. And secondly, can someone explain what is happening in that's second line, $b=$a | select –unique. I have never seen the results of a variable assignment piped like this. And what I thought was happening isn't, since I though $b = (Select-Object -InputObject $a –unique) would be the same, without the pipeline, and it is not. $b in that case is actually still identical to $a which makes no sense to me. But that said, I have some vague recollection of Select-Object behaving differently when used in the pipeline and not. Which is frustrating as all get out.

EDIT: It wasn't Select-Object, it was Sort-Object as I asked here. So, still wondering why it fails even using the pipeline when working with the XML, and wondering if there is a similar way to select unique items using .NET rather than a pipeline, because as with that other question, I find the pipeline to often be slow as molasses, and not particularly good for readability, so always looking for other options and falling back to the pipeline when it really is the only, or sometimes more performant, approach.

EDIT2: So, I may have answered my own question. ON the base why isn't it working front, the issue is that the array I get is really a [System.Object], and if I actually case to something else, say [System.Collections.Generic.List[String]], then Compare-Object can work with it. And I can avoid the pipeline if I sort using the technique in that link about Sort-Object, and also cast the results of that. That said, not sure this

$a = [System.Collections.Generic.List[String]]@($invalidXML.SelectNodes('//Package').id)
$b = [System.Collections.Generic.List[String]][System.Collections.Generic.SortedSet[String]]::new($a, [System.StringComparer]::InvariantCultureIgnoreCase)
Compare-object –referenceobject $b –differenceobject $a

is really improved readability. :) It IS more performant, at around 25% of the time required for the pipeline version when I tested with an array of just 128 characters. The pipeline is taking a full 2 seconds, but even the .NET approach is taking half a second, which is not viable since this is going to happen many times at initialization. Time to look for ways to speed this up.

EDIT #3: Doh. I was doing 1000 iterations of each test, to get larger numbers for comparison. This is unlikely to be repeated much more than 100 times at initialization, and even the pipeline approach is only taking .2 seconds for that. So I suspect I can choose exclusively based on readability. And I may just keep the .NET as seeing it more often, with a comment explaining why I used it, means I will remember it. :)

EDIT #4: Doh. Nice reminder to myself how important even the most subtle of formatting can be, especially when tired. $b=$a | select –unique looked really weird to me. Like the result of $b=$a was being piped to select –unique. But, add some spaces and $b = $a | select –unique suddenly means $b is equal to the results of $a | select –unique. Glad I looked at this a few days later, and in the morning. :) Debating if I should really use this, at least in scripts.

$b = ($a | select-object –unique)
Gordon
  • 6,257
  • 6
  • 36
  • 89
  • What is it question? "*Find duplicates in an array*" or "*determine if there are duplicates in an array*" ? – iRon Nov 30 '21 at 12:08
  • For an appropriate pipeline approach you shouldn't assign it to a variable (because that will choke the pipeline). Anyways, you might use a `[HashSet]`(https://learn.microsoft.com/dotnet/api/system.collections.generic.hashset-1) (which returns `$false` on the `Add` method if the element is already present) and stop at the first occurrence: `$HashSet = [System.Collections.Generic.HashSet[string]]::new([System.StringComparer]::InvariantCultureIgnoreCase); $xml.definitions.Package.Package.id.where({ !$HashSet.Add($_) }, 'First')` – iRon Nov 30 '21 at 12:11
  • You need to use `$a.getenumerator()` – AdminOfThings Nov 30 '21 at 12:16
  • The '**| select –unique**' should be replaced with '**| Get-Unique**'. (I tested your issue on both version 5 & 7). I got the same error when using '**| select –unique**' but not when using '**| Get-Unique**' – NeoTheNerd Dec 05 '21 at 11:48

1 Answers1

0

I believe its failing because of using '| select –unique'. If you try '| Get-Unique' you should yield better results. Example below, see picture for results.

Here's How To Find Duplicates In XML Array

The XML Data File I Used

<Computers>
    <Computer>
        <Name>SRV-01</Name>
        <Ip>127.0.0.1</Ip>
        <Include>true</Include>
    </Computer> 
    <Computer>
        <Name>SRV-02</Name>
        <Ip>192.168.0.102</Ip>
        <Include>false</Include>
    </Computer> 
    <Computer>
        <Name>SRV-03</Name>
        <Ip>192.168.0.103</Ip>
        <Include>true</Include>
    </Computer> 
        <Computer>
        <Name>SRV-03</Name>
        <Ip>192.168.0.103</Ip>
        <Include>False</Include>
    </Computer>
</Computers>

Example Of Commands Used To Find Duplicate IP Addresses In XML File

[xml]$xmltest = Get-Content -Path  C:\temp\xmlfiles\test.xml
$xmltest.Computers.Computer

$a = $xmltest.Computers.Computer.ip
$a

$b = $xmltest.Computers.Computer.ip | Get-Unique
$b

##=> - Difference in destination object.
##<= - Difference in reference (source) object.
##== - When the source and destination objects are equal.

Compare-object -referenceobject $b -differenceobject $a

enter image description here

NeoTheNerd
  • 566
  • 3
  • 11