I have a large set of data roughly 10 million items that I need to process efficiently and quickly removing duplicate items based on two of the six column headers.
I have tried grouping and sorting items but it's horrendously slow.
$p1 = $test | Group-Object -Property ComputerSeriaID,ComputerID
$p2 = foreach ($object in $p1.group) {
$object | Sort-Object -Property FirstObserved | Select-Object -First 1
}
The goal would be to remove duplicates by assessing two columns while maintaining the oldest record based on first observed.
The data looks something like this:
LastObserved : 2019-06-05T15:40:37 FirstObserved : 2019-06-03T20:29:01 ComputerName : 1 ComputerID : 2 Virtual : 3 ComputerSerialID : 4
LastObserved : 2019-06-05T15:40:37 FirstObserved : 2019-06-03T20:29:01 ComputerName : 5 ComputerID : 6 Virtual : 7 ComputerSerialID : 8
LastObserved : 2019-06-05T15:40:37 FirstObserved : 2019-06-03T20:29:01 ComputerName : 9 ComputerID : 10 Virtual : 11 ComputerSerialID : 12