If you replace your nested loops with 2 HashSet's, you'll have two ways of calculating the exception between the two:
Using SymmetricExceptWith()
The HashSet<T>.SymmetricExceptWith()
function allows us to calculate the subset of terms that exist in either collection but not in both:
# Create hashset from one list
$userIDs = [System.Collections.Generic.HashSet[string]]::new([string[]]$csv1.identifier)
# Pass the other list to `SymmetricExceptWith`
$userIDs.SymmetricExceptWith([string[]]$csv2.identifier)
# Now we have an efficient filter!
$relevantRecords = @($csv1;$csv2) |Where-Object { $userIDs.Contains($_.identifier) } |Sort-Object -Unique identifier
Using a sets to track duplicates
Similarly we can use hash sets to keep track of which terms that have been observed at least once, and which ones has been seen more than once:
# Create sets for tracking
$seenOnce = [System.Collections.Generic.HashSet[string]]::new()
$seenTwice = [System.Collections.Generic.HashSet[string]]::new()
# Loop through whole superset of records
foreach($record in @($csv1;$csv2)){
# Always attempt to add to the $seenOnce set
if(!$seenOnce.Add($record.identifier)){
# We've already seen this identifier once, add it to $seenTwice
[void]$seenTwice.Add($record.identifier)
}
}
# Just like the previous example, we now have an efficient filter!
$relevantRecords = @($csv1;$csv2) |Where-Object { $seenOnce.Contains($_.identifier) -and -not $seenTwice.Contains($_.identifier) } |Sort-Object -Unique identifier
Using a hash table as a grouping construct
You could also use a dictionary type (like a [hashtable]
for example) to group records from both csv files based on their identifier, and then filter on number of record values in each dictionary entry:
# Groups records on their identifier value
$groupsById = @{}
foreach($record in @($csv1;$csv2)){
if(-not $groupsById.ContainsKey($record.identifier)){
$groupsById[$record.identifier] = @()
}
$groupsById[$record.identifier] += $record
}
# Filter based on number of records with a distinct identifier
$relevantRecords = $groupsById.GetEnumerator() |Where-Object { $_.Value.Count -eq 1 } |Select-Object -Expand Value