I have a loop of about 7000 objects and within the loop I need to get a distinct count of a list of structs. Currently I am using -
foreach (var product in productsToSearch)
{
Console.WriteLine("Time elapsed: {0} start", stopwatch.Elapsed);
var cumulativeCount = 0;
productStore.Add(product);
var orderLinesList = totalOrderLines
.Where(myRows => productStore.Contains(myRows.Sku))
.Select(myRows => new OrderLineStruct
{
OrderId = myRows.OrderId,
Sku = myRows.Sku
});
var differences = totalOrderLines.Except(orderLinesList);
cumulativeCount = totalOrderLinsCount - differences.Select(x => x.OrderId).Distinct().Count();
cumulativeStoreTable.Rows.Add(product, cumulativeCount);
Console.WriteLine("Time elapsed: {0} end", stopwatch.Elapsed);
}
public struct OrderLineStruct
{
public string OrderId { get; set; }
public string Sku { get; set; }
}
This is extremely slow when getting the distinct count. Anyone know of a more efficient way of doing this? I have tried using MoreLinq which has a DisctintBy method for Linq but it's not more efficient as I have timed it. I have played around with PLinq but I am a little unsure of where to parallelize this query.
So each iteration of the loop is timed at -
Time elapsed: 00:00:37.1142047 start
Time elapsed: 00:00:37.8310148 end
= 0.7168101 secs * 7000 = 5017.6707 (83.627845 minutes)
Its the Distinct() Count() line which is taking the most time to process (around 0.5 secs). The variable differences has a few hundred thousand OrderLineStruct's so doing any linq queries on this is slow.
UPDATE
I have modified the loop a bit and now it runs in around 10 minutes rather that over 1 hour
foreach (var product in productsToSearch)
{
var cumulativeCount = 0;
productStore.Add(product);
var orderLinesList = totalOrderLines
.Join(productStore, myRows => myRows.Sku, p => p, (myRows, p) => myRows)
.Select(myRows => new OrderLineStruct
{
OrderId = myRows.OrderId,
Sku = myRows.Sku
});
totalOrderLines = totalOrderLines.Except(orderLinesList).ToList();
cumulativeCount = totalOrderLinesCount - totalOrderLines.Select(x => x.OrderId).Distinct().Count();
cumulativeStoreTable.Rows.Add(product, cumulativeCount);
}
Having a .ToList() on the Except seems to make a difference and now I am removing the already processed orders after each iteration which is increasing performance for each iteration.