4

I feel silly that I cannot figure this out, but say I have an array containing pscustomobjects. At an incredibly high level take the following example:

$arr = @()
$obj1 = [pscustomobject]@{prop1="bob";prop2="dude";prop3="awesome"}
$obj2 = [pscustomobject]@{prop1="bob";prop2="dude";prop3="awesome"}

$arr += $obj1

In this case $obj1 and $obj2 have the exact same items/properties. How do I test if $testarr contains the properties of $obj1 to avoid adding $obj2 to it?

Note, the above is a crude example. The pscustomobjects are being dynamically created from a dataset and added to the array, but I want to avoid duplicates from being added.

I understand the following returns true, but I fully expect duplicates for any given single property. As such I need to compare the ENTIRE pscustomobject and all properties together for uniqueness.

$arr.Name -Contains 'Bob' #returns true

Side question... Why are $obj1 and $obj2 not themselves considered equal? I assume it's because they are technically different objects, just with the same values, but I don't understand why that works, but two different variables with just a string tests as the same.

$obj1 -eq $obj2  #returns false
$str1 = "test"
$str2 = "test"
$str1 -eq $str2  #returns true

3 Answers3

3

The problem is the behavior of the open-ended [pscustomobject] type with respect to equality comparison and as hashtable keys:

[pscustomobject] is a .NET reference type (that doesn't define custom equality comparisons), so comparing two instances with -eq tests for reference equality, which means that only values that reference the very same instance are considered equal.[1]

Using [pscustomobject] instance as the keys of a hashtable is similarly unhelpful, because, as iRon points out, calling .GetHashCode() on a [pscustomobject] instance always yields the same value, irrespective of the instance's set of properties and values.[2] Arguably, this is a bug, as discussed in GitHub issue #15806.


Solutions:

  • If you're willing to use (PSv5+) custom classes in lieu of [pscustomobject] instances, Santiago Squarzon's helpful answer offers a solution that relies on a custom class implementing the System.IEquatable<T> interface in order to support a custom, class-specific equality test - but note that since such as custom class compares specific, hard-coded properties, it isn't a general replacement for the open-ended [pscustomobject] type, whose instances can have arbitrary property sets.

  • iRon's helpful answer provides a generic solution via a custom class that wraps a hashtable and uses the XML-serialized form of its [pscustomobject] entries as the entry keys (using the serialization format PowerShell uses for its remoting and background-job infrastructure), relying on the fact that distinct strings with the same content report the same hash code, via .GetHahCode(). This is probably the best overall solution, because it performs reasonably well while providing a generic comparison that is reasonably robust: it works robustly for value-type property values (as are typical in [pscustomobject] instances) and tests the properties of reference-type values for value equality, but the necessary limit on serialization depth means that it is at least possible for deeply nested objects with differing property values below the serialization depth to be considered the same - see this answer for more information on PowerShell's serialization and its limitations.

  • Below is an ad-hoc solution based on iRon's answer that doesn't require defining custom classes, but it doesn't perform well.

# Available in PSv5+, to allow referencing the [System.Management.Automation.PSSerializer] type
# as just [PSSerializer]; in v4-, use the full type name.
using namespace System.Management.Automation

# Define a *list* rather than an array, because it is
# efficiently extensible
$list = [System.Collections.ArrayList] (
  [pscustomobject] @{prop1="bob";  prop2="dude";   prop3="awesome"}, 
  [pscustomobject] @{prop1="alice";prop2="dudette";prop3="awesome"}
)

# Conditionally add two objects to the list:
# One of them is a duplicate and will be ignored.
[pscustomobject]@{prop1="bob";prop2="dude";prop3="awesome"},
[pscustomobject]@{prop1="ted";prop2="dude";prop3="middling"} | ForEach-Object {
  if ($list.ForEach({ [PSSerializer]::Serialize($_) }) -cnotcontains [PSSerializer]::Serialize($_)) {
     $null = $list.Add($_)
  }
}

Note the use of the .ForEach() array method so as to (relatively) efficiently serialize each element of list $list, though note that it invariably involves creating a temporary array of the same size, containing the element-specific serializations.

There are ways of optimizing the performance of this code, but if that is needed you may as well use iRon's solution.


[1] For instance, [pscustomobject] @{ foo=1 } -eq [pscustomobject] @{ foo=1 } yields $false, because two distinct instances are being compared; that they happen to have the same set of properties and values is irrelevant.

[2] For instance, the following prints the same value twice, despite providing two obviously different objects as input: [pscustomobject] @{ foo=1 }, [pscustomobject] @{ bar=2 } | % GetHashCode

[3] For instance, ([pscustomobject]@{prop1="bob";prop2="dude";prop3="awesome"}).psbase.ToString() returns verbatim @{prop1=bob; prop2=dude; prop3=awesome}

mklement0
  • 382,024
  • 64
  • 607
  • 775
  • 1
    I hope what I'm asking does make sense, is it possible to override the `Equals` method from the `pscustomobject` class ? – Santiago Squarzon Jul 19 '21 at 23:44
  • 1
    Good question, @SantiagoSquarzon: While you can override a `[pscustomobject]` instance's `.Equals()` method with something like `... | Add-Member Equals -Type ScriptMethod { param([object] $o) <# ... #> } -Force`, it only takes effect for explicit `.Equals()` calls, not for `-eq` / `-contains` / `-in` operations. – mklement0 Jul 19 '21 at 23:58
3

Taking this purely from MS Docs, I'm nowhere near an expert on classes.

To create comparable classes, you need to implement System.IEquatable<T> in your class.

class CustomObjectEquatable : System.IEquatable[Object] {
    [string] $Prop1
    [string] $Prop2
    [string] $Prop3

    [bool] Equals([Object]$obj) {
        return $this.Prop1 -eq $obj.Prop1 -and
               $this.Prop2 -eq $obj.Prop2 -and
               $this.Prop3 -eq $obj.Prop3
    }

    [int] GetHashCode() {
        return [Tuple]::Create(
            [string] $this.Prop1,
            [string] $this.Prop2,
            [string] $this.Prop3
        ).GetHashCode()
    }
}
  • Testing for equality:
$x = [CustomObjectEquatable]@{
    Prop1 = "bob"
    Prop2 = "dude"
    Prop3 = "awesome"
}
$y = [CustomObjectEquatable]@{
    Prop1 = "bob"
    Prop2 = "dude"
    Prop3 = "awesome"
}

$x -eq $y # => True
$x.GetHashCode() -eq $y.GetHashCode() # => True

$x = [CustomObjectEquatable]@{
    prop1 = "john"
    prop2 = "dude"
    prop3 = "awesome"
}
$y = [CustomObjectEquatable]@{
    prop1 = "johhn"
    prop2 = "dude"
}

$x -eq $y  # => False
$x.GetHashCode() -eq $y.GetHashCode() # => False

- Edit 9/9/2022

I've decided to add this comparer class to this answer, the implementation is mostly inspired by this answer from mklement0. It can, theoretically, test for equality between PSCustomObject instances with any amount of properties and allows for case sensitive and insensitive comparison. Any suggestions for improvements are welcomed.

using namespace System.Collections.Generic
using namespace System.Collections

class PSCustomObjectComparer : IEqualityComparer[object] {
    [StringComparer] $Comparer = [StringComparer]::InvariantCultureIgnoreCase

    PSCustomObjectComparer() { }
    PSCustomObjectComparer([StringComparer] $Comparer) {
        $this.Comparer = $Comparer
    }

    [bool] Equals([object] $xObject, [object] $yObject) {
        $x = @($xObject.PSObject.Properties)
        $y = @($yObject.PSObject.Properties)

        if(-not $x.Count.Equals($y.Count)) {
            return $false
        }

        return ([IStructuralEquatable] $x.Name).Equals($y.Name, $this.Comparer) -and
               ([IStructuralEquatable] $x.Value).Equals($y.Value, $this.Comparer)
    }

    [int] GetHashCode([object] $xObject) {
        $x = $xObject.PSObject.Properties
        try {
            return ([IStructuralEquatable] $x.Name).GetHashCode($this.Comparer) -bxor
                   ([IStructuralEquatable] $x.Value).GetHashCode($this.Comparer)
        }
        catch {
            $values = foreach($value in $x.Value) {
                if(-not $value) { continue }
                $value
            }

            if(-not $values) {
                return ([IStructuralEquatable] $x.Name).GetHashCode($this.Comparer)
            }

            return ([IStructuralEquatable] $x.Name).GetHashCode($this.Comparer) -bxor
                   ([IStructuralEquatable] $values).GetHashCode($this.Comparer)
        }
    }
}
  • Testing for equality:
$hash = [HashSet[object]]::new([PSCustomObjectComparer]::new())
$hash.Add([pscustomobject]@{ foo = 'hello'; bar = 'World'; baz = 123 }) # true
$hash.Add([pscustomobject]@{ foo = 'hello'; bar = 'World'; baz = 123 }) # false
$hash.Add([pscustomobject]@{ foo = 'HELLO'; bar = 'World'; baz = 123 }) # false

$hash = [HashSet[object]]::new([PSCustomObjectComparer]::new([StringComparer]::InvariantCulture))
$hash.Add([pscustomobject]@{ foo = 'hello'; bar = 'World'; baz = 123 }) # true
$hash.Add([pscustomobject]@{ foo = 'hello'; bar = 'World'; baz = 123 }) # false
$hash.Add([pscustomobject]@{ foo = 'HELLO'; bar = 'World'; baz = 123 }) # true

$insensitiveComparison = [PSCustomObjectComparer]::new()
$insensitiveComparison.Equals(
    [pscustomobject]@{ foo = 'hello'; bar = 'World'; baz = 123 },
    [pscustomobject]@{ foo = 'HELLO'; bar = 'World'; baz = 123 }
) # true

$sensitiveComparison = [PSCustomObjectComparer]::new([StringComparer]::InvariantCulture)
$sensitiveComparison.Equals(
    [pscustomobject]@{ foo = 'hello'; bar = 'World'; baz = 123 },
    [pscustomobject]@{ foo = 'HELLO'; bar = 'World'; baz = 123 }
) # false

$sensitiveComparison.Equals(
    [pscustomobject]@{ foo = 'HELLO'; bar = 'World'; baz = 123 },
    [pscustomobject]@{ foo = 'HELLO'; bar = 'World'; baz = 123 }
) # true
Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
  • 1
    Thank you. I was hoping there was a simple way to do this without creating custom classes and specifically factoring in an exact number of properties (3 in this case). With that said, based on what I see in all the responses, there does not seem to be a native way to do this with native pscustomobjects. Is that accurate? – Matthew McDonald Jul 20 '21 at 13:50
  • 1
    Caveat: In order for an implementation of `IEquatable\`1` to also work as expected in hashtables, `.GetHashCode()` must be overridden as well, so that `.Equals()` and `.GetHashCode()` work in sync: If `.Equals()` returns `$true` for two instances, `.GetHashCode()` must return the same value for both. See [the docs](https://learn.microsoft.com/en-US/dotnet/api/System.Object.GetHashCode#remarks). – mklement0 Jul 20 '21 at 17:46
2

To complement the answers from @Santiago Squarzon and @mklement0 and work to your final goal "to add unique pscustomobjects to a list":

Try to avoid using the increase assignment operator (+=) to create a collection.
Instead, it is recommended to use a hashtable (binary search) to check for duplicate objects:

class PSHashSet {
    $Dictionary = [System.Collections.Specialized.OrderedDictionary]::new([StringComparer]::OrdinalIgnoreCase)
    [Void]Add([pscustomobject]$Item) {
        $Key = [System.Management.Automation.PSSerializer]::Serialize($Item)
        $This.Dictionary[$Key] = $Item
    }
    [pscustomobject[]]Get() {
        Return $This.Dictionary.Values
    }
}

Usage:

$Arr = [PSHashSet]::New()

$Arr.Add([pscustomobject]@{prop1="bob";prop2="dude";prop3="awesome"})
$Arr.Add([pscustomobject]@{prop1="bob";prop2="dude";prop3="awesome"})
$Arr.Add([pscustomobject]@{prop1="john";prop2="dude";prop3="awesome"})

$Arr.Get()

prop1 prop2 prop3
----- ----- -----
bob   dude  awesome
john  dude  awesome

Note:
Be careful with adding objects with unaligned properties as they might not show up on the display (although they do exists in the list), see: Not all properties displayed

iRon
  • 20,463
  • 10
  • 53
  • 79
  • Pardon my ignorance. In your example, does the code prevent the duplicate object from being added and simply displays the unique values or is it somehow preventing the addition of the second object in the example? If the latter can you explain what mechanism is handling that? Is it the $This.HashTable[$Key]=$item simply overwriting the existing value at key=Bob? – Matthew McDonald Jul 20 '21 at 13:46
  • Nicely done, but I suggest using an _ordered_ hashtable. Re `.GetHashCode()`: it is used behind the scenes in hash-based collections; see [the docs](https://learn.microsoft.com/en-US/dotnet/api/System.Object.GetHashCode#remarks) – mklement0 Jul 20 '21 at 15:13
  • @MatthewMcDonald, I've added background info to my answer and have fixed the non-`class`-based solution there, borrowing the `[System.Management.Automation.PSSerializer]` technique, but I suggest going with iRon's approach, primarily for performance reasons but also for conceptual elegance and reusability. – mklement0 Jul 20 '21 at 15:17
  • 1
    Nice: I originally thought of `$Dictionary = [ordered] @{}`, but your implementation is better because it is case-_sensitive_. Yes, collision checks are always necessary, because hash codes _do not guarantee uniqueness_ - and for that reason you shouldn't use them as keys directly. – mklement0 Jul 20 '21 at 17:41
  • 2
    As for `[pscustomobject]`'s `.GetHashCode()` always reporting the same value: I agree that this smells like a bug, so I've created [GitHub issue #15806](https://github.com/PowerShell/PowerShell/issues/15806) – mklement0 Jul 20 '21 at 18:26