85

i wanted a small logic to compare contents of two arrays & get the value which is not common amongst them using powershell

example if

$a1=@(1,2,3,4,5)
$b1=@(1,2,3,4,5,6)

$c which is the output should give me the value "6" which is the output of what's the uncommon value between both the arrays.

Can some one help me out with the same! thanks!

Nathan Fellman
  • 122,701
  • 101
  • 260
  • 319
PowerShell
  • 1,991
  • 8
  • 35
  • 57
  • 1
    To give a name to the task at hand, at least with respect to what the `Compare-Object` answers here implement: the [_symmetric difference_](https://en.wikipedia.org/wiki/Symmetric_difference) between two sets is being determined - but only if the input arrays are truly _sets_ (as in the question), i.e. have _no duplicate elements_. – mklement0 Oct 09 '19 at 18:40
  • A related task - the [_relative complement_](https://en.wikipedia.org/wiki/Complement_(set_theory)#Relative_complement) aka _set difference_ - which elements of one set aren't also in another? - is the subject of [this related question](https://stackoverflow.com/q/58307606/45375). – mklement0 Oct 09 '19 at 18:47

6 Answers6

118
PS > $c = Compare-Object -ReferenceObject (1..5) -DifferenceObject (1..6) -PassThru
PS > $c
6
Shay Levy
  • 121,444
  • 32
  • 184
  • 206
  • 3
    A note for those trying to compare the Keys collections of two hashtables: I assumed Keys collections were like arrays and that I could use Compare-Object to compare them. It turns out Compare-Object sees each Keys collection as a single object so returns a result indicating _**all**_ keys in hashtable one are missing from hashtable two and vice versa. To get it to work I had to convert the Keys collections to arrays. The quickest way I've found is: `$keys = @($Null) * $ht.Keys.Count` to initialize an array of the correct size then `$ht.Keys.CopyTo($keys, 0)` to copy the Keys to the array. – Simon Elms Feb 04 '18 at 20:32
  • 1
    It looks like you can do the `KeyCollection` to `object[]` conversion by just wrapping the value in `@()` like `@($keys)`. – mdonoughe Oct 22 '18 at 13:03
  • Great solution, small caveat: While `-PassThru` also passes the input elements of interest through, it additionally _decorates them_ with a `SideIndicator` note property that may surface in scenarios such as JSON serialization. Try `(Compare-Object 1 2 -PassThru).SideIndicator`. `(Compare Object ...).InputObject`, as in [this answer](https://stackoverflow.com/a/22310789/45375), avoids that problem. – mklement0 Oct 09 '19 at 17:20
  • @SimonTewsi: mdonoughe is correct; to illustrate: `$ht1 = @{foo=1;bar=2}; $ht2 = @{foo=1;baz=3}; Compare-Object @($ht1.Keys) @($ht2.Keys)` – mklement0 Oct 09 '19 at 17:23
115

Collection

$a = 1..5
$b = 4..8

$Yellow = $a | Where {$b -NotContains $_}

$Yellow contains all the items in $a except the ones that are in $b:

PS C:\> $Yellow
1
2
3

$Blue = $b | Where {$a -NotContains $_}

$Blue contains all the items in $b except the ones that are in $a:

PS C:\> $Blue
6
7
8

$Green = $a | Where {$b -Contains $_}

Not in question, but anyways; Green contains the items that are in both $a and $b.

PS C:\> $Green
4
5

Notes:

  • Where is an alias of Where-Object.
    Aliases might introduce possible problems and make scripts hard to maintain.
  • Instead of the -NotContains operator you might also use:
    • the -NotIn operator (and swap the operands):
      $Yellow = $a | Where {$_ -NotIn $b}
    • or use the common operator feature: "When the input is a collection, the operator returns the elements of the collection that match the right-hand value of the expression", e.g.:
      $Yellow = $a | Where {-not ($b -eq $_)}

Addendum 12 October 2019

As commented by @xtreampb and @mklement0: although not shown from the example in the question, the task that the question implies (values "not in common") is the symmetric difference between the two input sets (the union of yellow and blue).

Union

The symmetric difference between the $a and $b can be literally defined as the union of $Yellow and $Blue:

$NotGreen = $Yellow + $Blue

Which is written out:

$NotGreen = ($a | Where {$b -NotContains $_}) + ($b | Where {$a -NotContains $_})

Performance

As you might notice, there are quite some (redundant) loops in this syntax: all items in list $a iterate (using Where) through items in list $b (using -NotContains) and visa versa. Unfortunately the redundancy is difficult to avoid as it is difficult to predict the result of each side. A Hash Table is usually a good solution to improve the performance of redundant loops. For this, I like to redefine the question: Get the values that appear once in the sum of the collections ($a + $b):

$Count = @{}
$a + $b | ForEach-Object {$Count[$_] += 1}
$Count.Keys | Where-Object {$Count[$_] -eq 1}

By using the ForEach statement instead of the ForEach-Object cmdlet and the Where method instead of the Where-Object you might increase the performance by a factor 2.5:

$Count = @{}
ForEach ($Item in $a + $b) {$Count[$Item] += 1}
$Count.Keys.Where({$Count[$_] -eq 1})

LINQ

But Language Integrated Query (LINQ) will easily beat any native PowerShell and native .Net methods (see also High Performance PowerShell with LINQ and mklement0's answer for Can the following Nested foreach loop be simplified in PowerShell?:

To use LINQ you need to explicitly define the array types:

[Int[]]$a = 1..5
[Int[]]$b = 4..8

And use the [Linq.Enumerable]:: operator:

$Yellow   = [Int[]][Linq.Enumerable]::Except($a, $b)
$Blue     = [Int[]][Linq.Enumerable]::Except($b, $a)
$Green    = [Int[]][Linq.Enumerable]::Intersect($a, $b)
$NotGreen = [Int[]]([Linq.Enumerable]::Except($a, $b) + [Linq.Enumerable]::Except($b, $a))

SymmetricExceptWith

(Added 2022-05-02)
There is actually another way to get the symmetric difference which is using the SymmetricExceptWith method of the HashSet class, for a details see the specific answer from mklement0 on Find what is different in two very large lists:

$a = [System.Collections.Generic.HashSet[int]](1..5)
$b = [System.Collections.Generic.HashSet[int]](4..8)

$a.SymmetricExceptWith($b)
$NotGreen = $a # note that the result will be stored back in $a

Benchmark

(Updated 2022-05-02, thanks @Santiago for the improved benchmark script)
Benchmark results highly depend on the sizes of the collections and how many items there are actually shared. Besides there is a caveat with drawing conclussions on methods that use lazy evaluation (also called deferred execution) as with LINQ and the SymmetricExceptWith where actually pulling the result (e.g. @($a)[0]) causes the expression to be evaluated and therefore might take longer than expected as nothing has been done yet other than defining what should be done. See also: Fastest Way to get a uniquely index item from the property of an array
Anyways, as an "average", I am presuming that half of each collection is shared with the other.

Test           TotalMilliseconds
----           -----------------
Compare-Object          118.5942
Where-Object            275.6602
ForEach-Object           52.8875
foreach                  25.7626
Linq                     14.2044
SymmetricExce…            7.6329

To get a good performance comparison, caches should be cleared by e.g. starting a fresh PowerShell session.

[Int[]]$arrA = 1..1000
[Int[]]$arrB = 500..1500

Measure-Command {&{
    $a = $arrA
    $b = $arrB
    Compare-Object -ReferenceObject $a -DifferenceObject $b  -PassThru
}} |Select-Object @{N='Test';E={'Compare-Object'}}, TotalMilliseconds
Measure-Command {&{
    $a = $arrA
    $b = $arrB
    ($a | Where {$b -NotContains $_}), ($b | Where {$a -NotContains $_})
}} |Select-Object @{N='Test';E={'Where-Object'}}, TotalMilliseconds
Measure-Command {&{
    $a = $arrA
    $b = $arrB
    $Count = @{}
    $a + $b | ForEach-Object {$Count[$_] += 1}
    $Count.Keys | Where-Object {$Count[$_] -eq 1}
}} |Select-Object @{N='Test';E={'ForEach-Object'}}, TotalMilliseconds
Measure-Command {&{
    $a = $arrA
    $b = $arrB
    $Count = @{}
    ForEach ($Item in $a + $b) {$Count[$Item] += 1}
    $Count.Keys.Where({$Count[$_] -eq 1}) # => should be foreach($key in $Count.Keys) {if($Count[$key] -eq 1) { $key }} for fairness
}} |Select-Object @{N='Test';E={'foreach'}}, TotalMilliseconds
Measure-Command {&{
    $a = $arrA
    $b = $arrB
    [Int[]]([Linq.Enumerable]::Except($a, $b) + [Linq.Enumerable]::Except($b, $a))
}} |Select-Object @{N='Test';E={'Linq'}}, TotalMilliseconds
Measure-Command {&{
    $a = $arrA
    $b = $arrB
    ($r = [System.Collections.Generic.HashSet[int]]::new($a)).SymmetricExceptWith($b)
}} |Select-Object @{N='Test';E={'SymmetricExceptWith'}}, TotalMilliseconds
iRon
  • 20,463
  • 10
  • 53
  • 79
  • 1
    What may also prove to be useful is what is not common (!green). So what is in only yellow or blue (1,2,3,6,7,8) – xtreampb Oct 15 '18 at 20:14
  • @xtreampb, I have given your suggestion some thoughts and came to the conclusion that you might create all kind of sophisticated embedded `ForEach` loops for this, but in the end it is simply: `$NotGreen = $Yellow + $Blue`, which is written out: `$NotGreen = ($a | Where {$b -NotContains $_}) + ($b | Where {$a -NotContains $_})` – iRon Oct 16 '18 at 18:00
  • 1
    To add to @xtreampb's comment: the task that the question implies (values "not in common") is the _symmetric difference_ between the two input sets (the _union_ of yellow and blue). That is what the other answers here implement, whereas yours implements something different: the _relative complement_ / _set difference_ (either yellow or blue) and the _intersection_ - though you illustrate those very well. I suggest making that clear in the answer. – mklement0 Oct 09 '19 at 17:29
  • Clarification: The `Compare-Object` solutions here only implement the symmetric difference if the input arrays have _no duplicates_. Also worth mentioning: the `Where-Object` / `-not[contains]` solutions are conceptually simple and concise, but with larger arrays can pose a performance problem, due to performing an array lookup for every input element - [LINQ offers a much faster solution](https://stackoverflow.com/a/58309988/45375), though it's somewhat complex. – mklement0 Oct 09 '19 at 18:36
  • 1
    @mklement0, thanks for the clarifications and pointing out the actual request for a *symmetric difference*, I missed that (partly because it doesn't from the example in the question). I have done some performance testing and will update my answer this weekend. – iRon Oct 10 '19 at 07:20
  • Thanks, @iRon: Yes, it's unfortunate that the sample data in the question is ambiguous and could be interpreted as asking for a relative complement as well. Re performance: The linked answer in my previous comment also contains the results of performance tests; it'll be interesting to see if you reach similar conclusions. – mklement0 Oct 10 '19 at 13:17
  • 1
    Nice answer! May I suggest, since the `Measure-Command` _script block_ is dot sourced instead of executed to add an inner script block and also define the arrays inside each measurement ? This would make a more fair test. I've updated your current code in this gist in case you're willing to update the answer https://gist.github.com/santysq/442eca1f79668de39e5367a51c7f3cdb – Santiago Squarzon May 02 '22 at 00:54
  • 1
    @Santiago, thanks for the improved benchmark script and the included `SymmetricExceptWith` method suggestion. I have updated my answer accordingly. – iRon May 02 '22 at 12:26
  • Thank you iRon, and my pleasure I enjoy a lot your benchmarking answers :) – Santiago Squarzon May 02 '22 at 12:34
  • 1
    This answer is beautiful. Starting with the picture of the sets, he then can clearly talk about Yellow Green and Blue throughout. Then presenting several ways to do it (with benchmarks!). Fantastic. (I decided to use Linq, myself) – dajo Aug 05 '22 at 13:46
  • @iRon, can we use the System.Collections.Generic.HashSet with the array of strings or just integers? because I got the error: Cannot find an overload for "new" and the argument count: "1". At line:1 char:1 + [System.Collections.Generic.HashSet[String]]::new($AuditLogsUser) + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (:) [], MethodException + FullyQualifiedErrorId : MethodCountCouldNotFindBest – Senior Systems Engineer Aug 29 '23 at 15:02
  • 1
    @SeniorSystemsEngineer, Yes, but not via the `new()` constructor. You need to use the PowerShell initializer: `[System.Collections.Generic.HashSet[String]]$AuditLogsUser` – iRon Aug 29 '23 at 15:30
  • @iRon, thanks for the clarification, that greatly helps me to learn a lot. – Senior Systems Engineer Aug 31 '23 at 00:50
15

Look at Compare-Object

Compare-Object $a1 $b1 | ForEach-Object { $_.InputObject }

Or if you would like to know where the object belongs to, then look at SideIndicator:

$a1=@(1,2,3,4,5,8)
$b1=@(1,2,3,4,5,6)
Compare-Object $a1 $b1
stej
  • 28,745
  • 11
  • 71
  • 104
  • 9
    Adding the -PassThru option makes it output nicer. Compare-Object $a1 $b1 -PassThru – MunkeyWrench Jul 24 '13 at 19:51
  • From what I can see, `Compare-Object $a1 $b1 | ForEach-Object { $_.InputObject }` and `Compare-Object $a1 $b1 -PassThru` seem to produce identical output. Of course, the -PassThru option is more concise. – Simon Elms Feb 04 '18 at 20:21
  • 1
    @SimonTewsi: They're _almost_ the same: while `-PassThru` also passes the input elements of interest through, it additionally _decorates them_ with a `SideIndicator` note property that may surface in unexpected scenarios. Try `(Compare-Object 1 2 -PassThru).SideIndicator`. – mklement0 Oct 09 '19 at 17:16
3

Try:

$a1=@(1,2,3,4,5)
$b1=@(1,2,3,4,5,6)
(Compare-Object $a1 $b1).InputObject

Or, you can use:

(Compare-Object $b1 $a1).InputObject

The order doesn't matter.

3

Your results will not be helpful unless the arrays are first sorted. To sort an array, run it through Sort-Object.

$x = @(5,1,4,2,3)
$y = @(2,4,6,1,3,5)

Compare-Object -ReferenceObject ($x | Sort-Object) -DifferenceObject ($y | Sort-Object)
doer
  • 661
  • 7
  • 5
  • 1
    -SyncWindow helps with "how far to look in the array for the match" – Garrett Mar 10 '17 at 00:34
  • 5
    No, sorting is _not_ required: `Compare-Object $x $y` will return the same result as above, showing that 6 is missing from the reference array. (I checked this both as of today's PS version (5.1) as well as PS version 3.) – Michael Sorens May 04 '17 at 15:57
1

This should help, uses simple hash table.

$a1=@(1,2,3,4,5) $b1=@(1,2,3,4,5,6)


$hash= @{}

#storing elements of $a1 in hash
foreach ($i in $a1)
{$hash.Add($i, "present")}

#define blank array $c
$c = @()

#adding uncommon ones in second array to $c and removing common ones from hash
foreach($j in $b1)
{
if(!$hash.ContainsKey($j)){$c = $c+$j}
else {hash.Remove($j)}
}

#now hash is left with uncommon ones in first array, so add them to $c
foreach($k in $hash.keys)
{
$c = $c + $k
}
Adithya Surampudi
  • 4,354
  • 1
  • 17
  • 17
  • 1
    Leaving out the questionable coding style, using hashtables as a substitute to -contains operator is not ok. The worst thing, this solution does not add anything to compare-object. – dmitry Aug 12 '15 at 11:48