I've done a bit of performance testing, and by far and away the fastest approach is @zett42's suggestion of converting the customer list into a hashtable.
Test Data
First, here's a little snippet to set up some test data. It has some unrealistic properties like being sorted by id, with the first 80% being paid customers and the last 20% unpaid, but I don't think that will affect the results much.
# build some test data
$count = 25000
$allCustomers = @(
1..$count | foreach-object {
[pscustomobject] [ordered] @{
"CustomerID" = $_
}
}
);
$paidIds = [int[]] @( 1..($count * 0.8) )
Original Approach
Here's the OP's approach as a baseline:
Measure-Command {
$NewArray = @( $allCustomers );
foreach( $paidId In $paidIds )
{
$NewArray = $NewArray | where-object { $_.CustomerID -ne $paidId }
}
}
# TotalMilliseconds : 37732.532
Hashtable
And here's @zett42's suggestion of using a hashtable to do fast removal of customers by CustomerId when we find them in the paid list:
Measure-Command {
# convert customers into a hashtable for fast removal by id
$lookups = @{};
foreach( $customer in $allCustomers )
{
$lookups.Add($customer.CustomerId, $customer)
}
# remove all the paid customer ids
foreach( $paidId In $paidIds )
{
$lookups.Remove($paidId)
}
# get the remaining unpaid customers
$NewArray = @( $lookups.Values )
}
# TotalMilliseconds : 53.9363
-notin
@AbrahamZinala's suggestion of -notin
also works pretty well if you want a quick one-liner and don't mind not having the raw blazing speed of the hashtable approach.
Measure-Command {
$NewArray = $allCustomers | where-object { $_.CustomerId -notin $paidIds };
}
# TotalMilliseconds : 698.2812
The hashtable approach scales better with larger datasets, so if you don't mind the little bit of extra setup that might be the one to go for...