1

I have some XML with a number of <task> nodes that can contain a combination of four child nodes in this order; <rules>, <preprocess>, <process>, <postprocess>.

<process> is mandatory, but the other three are optional. I need to validate this XML before using it to instantiate my Task object, and I can't use XSD because XSD 1.0 doesn't support some of the other things I have going on in the XML.

My thinking is this. I can convert the node names to a list $providedData, and also have a list $requiredOrder with the four node names in the required order, then duplicate that as a list $workingOrder. Loop through the requiredOrder and any item that isn't in $providedData is removed from $workingOrder. Now I have $workingOrder with the same items as $providedData, but in the order defined by $requiredOrder. Now a comparison tells me if $providedData is correctly ordered. So...

$requiredOrder = @('rules', 'preprocess', 'process', 'postprocess')
$providedData = @('preprocess', 'process')

CLS
$workingOrder = [System.Collections.Generic.List[String]]::new()
$workingOrder.AddRange([System.Collections.Generic.List[String]]$requiredOrder)
$providedOrder = [System.Collections.Generic.List[String]]::new()
$providedOrder.AddRange([System.Collections.Generic.List[String]]$providedData)
foreach ($item in $requiredOrder) {
    if ($providedOrder -notContains $item) {
        $workingOrder.Remove($item) > $null
    }
}
if (Compare-Object -ReferenceObject $workingOrder -DifferenceObject $providedOrder) {
    Write-Host "Correct"
} else {
    Write-Host "Incorrect"
}

I know I can't use the -eq operator, but I thought Compare-Object would work here. No dice, I get Incorrect. But if I just dump $workingOrder and $providedOrder to the console, they are (visually) the same.

So, two questions:

1: What am I doing wrong in my comparison here?

2: Is there a much better way to do this?

Interesting... if (($workingOrder -join ',') -eq ($providedOrder -join ',')) { works. I would still like to know if there is a better way, or a way to get Compare-Object to work. But I can proceed with this for now.

Gordon
  • 6,257
  • 6
  • 36
  • 89

2 Answers2

2

To compare whether two same-typed collections are equal, both in content and order, I like to use Enumerable.SequenceEqual():

function Test-NodeOrder 
{
  param([string[]]$Nodes)

  $requiredOrder = @('rules', 'preprocess', 'process', 'postprocess')
  $mandatory = @('process')

  $matchingNodes = $Nodes.Where({$_ -in $requiredOrder})

  if($missing = $mandatory.Where({$_ -notin $matchingNodes})){
    Write-Warning "The following mandatory nodes are missing: [$($missing -join ', ')]"
    return $false
  }

  $orderedNodes = $requiredOrder.Where({$_ -in $matchingNodes})

  if(-not [System.Linq.Enumerable]::SequenceEqual([string[]]$matchingNodes, [string[]]$orderedNodes)){
    Write-Warning "Wrong order provided - expected [$($orderedNodes -join ', ')] but got [$($matchingNodes -join ', ')]"
    return $false
  }

  return $true
}

Output:

PS C:\> $providedData = @('preprocess', 'process')
PS C:\> Test-NodeOrder $providedData
True
PS C:\> $providedData = @('preprocess')
PS C:\> Test-NodeOrder $providedData
WARNING: The following mandatory nodes are missing: [process]
False
PS C:\> $providedData = @('preprocess', 'process', 'rules')
PS C:\> Test-NodeOrder $providedData
WARNING: Wrong order provided - expected [rules, preprocess, process] but got [preprocess, process, rules]
False
Mathias R. Jessen
  • 157,619
  • 12
  • 148
  • 206
  • Aha, that's the kind of "better" I was thinking might exist. I wonder at what point my code is so much .NET that I might as well refactor to C#. :) – Gordon Nov 16 '20 at 09:12
  • That's for you to gauge and decide :) The question I'd be asking is: _why not just **correct** the order in the XML file if all mandatory child nodes are there_? :) – Mathias R. Jessen Nov 16 '20 at 09:14
  • 1
    To be honest there is a part of me that is tempted to just ignore the order in the XML since I can code to handle the nodes in the correct order, even if they aren't provided in that order. But I kind of want to require the correct order in the XML to help the folks providing the XML understand how it works. In the long run I will provide a GUI and ensure correct order. But I am so far behind schedule I can't do that now, and that's also what keeps me from also adding code to fix the XML. Path of least resistance is an error that lets the people creating the XML also maintain it. For now. :) – Gordon Nov 16 '20 at 09:29
0

There are two problems with your code:

  • You need to invert your success-test logic when calling Compare-Object.

  • In order to compare two arrays for not only containing the same elements, but also in the same order (sequence equality), you need to use -SyncWindow 0.

Therefore:

if (-not (Compare-Object -SyncWindow 0 $workingOrder $providedOrder)) {
  'Correct'
} else {
  'Incorrect'
}

As for the success-test logic:

Compare-Object's output doesn't indicate success of the comparison; instead, it outputs the objects that differ.

Given PowerShell's implicit to-Boolean conversion, using a Compare-Object call directly as an if conditional typically means: if there are differences, the conditional evaluates to $true, and vice versa.

Since Compare-Object with -SyncWindow 0 outputs at least two difference objects (one pair for each array position that doesn't match) and since a 2+-element array is always $true when coerced to a [bool], you can simply apply the -not operator on the result, which reports $true if the Compare-Object call had no output (implying the arrays were the same), and $false otherwise.


Optional reading: Performance comparison between Compare-Object -SyncWindow 0 and [System.Linq.Enumerable]::SequenceEqual():

Mathias R. Jessen's helpful answer shows a LINQ-based alternative for sequence-equality testing based on the System.Linq.Enumerable.SequenceEqual method, which generally performs much better than Compare-Object -SyncWindow 0, though with occasional invocations with smallish array sizes that may not matter.

The following performance tests illustrate this, based on averaging 10 runs with 1,000-element arrays.

The absolute timings, measured on a macOS 10.15.7 system with PowerShell 7.1, will vary based on many factors, but the Factor column should give a sense of relative performance.

Note that the Compare-Object -SyncWindow 0 call is fastest only on the very first invocation in a session; after [System.Linq.Enumerable]::SequenceEqual() has been called once in a session, calling it is internally optimized and becomes much faster than the Compare-Object calls.
That is, if you simply re-run the tests in a session, [System.Linq.Enumerable]::SequenceEqual() will be the fastest method by far, along the lines of the 2nd group of results below:

--- 1,000 elements: ALL-positions-different case:

Factor Secs (1-run avg.) Command                                                              TimeSpan
------ ----------------- -------                                                              --------
1.00   0.006             -not (Compare-Object $a1 $a2 -SyncWindow 0 | Select-Object -first 1) 00:00:00.0060075
1.59   0.010             [Linq.Enumerable]::SequenceEqual($a1, $a2)                           00:00:00.0095582
3.78   0.023             -not (Compare-Object $a1 $a2 -SyncWindow 0)                          00:00:00.0227288

--- 1,000 elements: 1-position-different-only case (Note: on first run in a session, the LINQ method is now compiled and is much faster):

Factor Secs (1-run avg.) Command                                                              TimeSpan
------ ----------------- -------                                                              --------
1.00   0.000             [Linq.Enumerable]::SequenceEqual($a1, $a2)                           00:00:00.0001879
22.40  0.004             -not (Compare-Object $a1 $a2 -SyncWindow 0)                          00:00:00.0042097
24.86  0.005             -not (Compare-Object $a1 $a2 -SyncWindow 0 | Select-Object -first 1) 00:00:00.0046707

Optimizations for Compare-Object -SyncWindow 0 shown above:

  • Because Compare-Object -SyncWindow 0 outputs difference objects, in the worst-case scenario it outputs 2 * N objects - one pair of difference objects for each mismatched array position.

    • Piping to Select-Object -First 1 so as to only output one difference object is an effective optimization in this case, but note that Compare-Object still creates all objects up front (it isn't optimized to recognize that with -SyncWindow 0 it doesn't need to collect all input first).
  • -PassThru, to avoid construction of the [pscustomobject] wrappers, can sometimes help a little, but ultimately isn't worth combining with the more important Select-Object -First 1 optimization; the reason that it doesn't help more is that the passed-though objects are still decorated with a .SideIndicator ETS property, which is expensive too.


Test code that produced the above timings, which is based on the Time-Command function available from this Gist:

  • Note: Assuming you have looked at the linked code to ensure that it is safe (which I can personally assure you of, but you should always check), you can install Time-Command directly as follows:
irm https://gist.github.com/mklement0/9e1f13978620b09ab2d15da5535d1b27/raw/Time-Command.ps1 | iex
foreach ($i in 1..2) {

  # Array size
  [int] $n = 1000
  
  # How many runs to average:
  # If you set this to 1 and $n is at around 1,300 or below, ONLY the very first
  # test result in a session will show 
  #   Compare-Object $a1 $a2 -SyncWindow 0 | Select-Object -first 1 
  # as the fastest method.
  # Once the LINQ method access is internally compiled, 
  # [Linq.Enumerable]::SequenceEqual() is dramatically faster, with any array size.
  $runs = 1

  # Construct the arrays to use.
  # Note: In order to be able to pass the arrays directly to [Linq.Enumerable]::SequenceEqual($a1, $a2),
  #       they must be strongly typed.
  switch ($i) {
    1 {
      Write-Host ('--- {0:N0} elements: ALL-positions-different case:' -f $n)
      # Construct the arrays so that Compare-Object will report 2 * N
      # difference objects.
      # This maximizes the Select-Object -First 1 optimization.
      [int[]] $a1 = 1..$n
      [int[]] $a2 = , 0 + 1..($n-1)
    }
    default {
      Write-Host  ('--- {0:N0} elements: 1-position-different-only case (Note: on first run in a session, the LINQ method is now compiled and is much faster):' -f $n)
      # Construct the arrays so that Compare-Object only outputs 2 difference objects.
      [int[]] $a1 = 1..$n
      [int[]] $a2 = 1..($n-1) + 42
    }
  }

  Time-Command -Count $runs {
    -not (Compare-Object $a1 $a2 -SyncWindow 0)
  },
  {
    -not (Compare-Object $a1 $a2 -SyncWindow 0 | Select-Object -first 1)
  },
  {
    [Linq.Enumerable]::SequenceEqual($a1, $a2)
  } | Out-Host

}
mklement0
  • 382,024
  • 64
  • 607
  • 775