I have a List of data that's combined from a entity framework database query with another IEnumerable of the same type with in memory data from other sources. For some of our clients this list amounts to about 200000 entries (about half from the db), which makes the grouping operating take extremely long (up to 30 minutes on our cheap virtual Windows server).
The grouping operation turns the list down to about 10000 objects (so about 20:1).
The data class of the List is basically just a big row of Strings and Ints and a few other basic types:
public class ExportData
{
public string FirstProperty;
public string StringProperty;
public string String1;
...
public string String27;
public int Int1;
...
public int Int15;
public decimal Mass;
...
}
The grouping is done through a custom IEqualityComparer that basically amounts to this:
- If items are allowed to be grouped by the custom logic, that means about half of the properties of both objects are equal, and those are the only properties we care about from this point on, besides ID, Mass and a special StringProperty which can still be different even if the items are allowed to be grouped.
- Each new grouped object should have the relevant properties (that were the same in step 1), plus the combined IDs from the grouped items as a string and the Sum of all the Mass (decimal) properties of the grouped items, and the special StringProperty should be set depending on if a special string occurs in any of the grouped items or not.
List<ExportData> exportData;
// in memory list of combined data from database + memory data
exportData = exportData.GroupBy(w => w, new ExportCompare(data)).Select(g =>
{
ExportData group = g.Key;
group.Mass = g.Sum(s => s.Mass);
if (g.Count() > 1)
{
group.CombinedIds = string.Join("-", g.Select(a => a.Id.ToString()));
}
if (g.Any(s => s.StringProperty.Equals("AB")))
{
group.StringProperty= "AB";
}
else if (g.Any(s => s.StringProperty.Equals("CD")))
{
group.StringProperty= "CD";
}
else
{
group.StringProperty= "EF";
}
return group;
}).ToList();
And the custom comparer for completeness:
public class ExportComparer : IequalityComparer<ExportData>
{
private CompareData data;
public ExportComparer()
{
}
public ExportComparer(CompareData comparedata)
{
// Additional data needed for comparison logic
// prefetched from another database
data = comparedata;
}
public bool Equals(ExportData x, ExportData y)
{
if (ReferenceEquals(x, y)) return true;
if (ReferenceEquals(x, null) || ReferenceEquals(y, null)) return false;
(...) // Rest of the unit-tested and already optimized very long comparison logic
return equality; // result from the custom comparison
}
public int GetHashCode(ExportData obj)
{
if (ReferenceEquals(obj, null)) return 0;
int hash = 17;
hash = hash * 23 + obj.FirstProperty.GetHashCode();
(...) // repeated for each property used in the comparison logic
return hash;
What can I do to make this groupby run faster?