The easier way to solve this problem is to use the new (.NET 6) MaxBy
LINQ operator, along with the GroupBy
and Select
operators:
IEnumerable<Record> query = records
.GroupBy(x => x.GroupName)
.Select(g => g.MaxBy(x => x.MemberValue));
This is an easy but not memory efficient solution. The reason is because it generates a full blown Lookup<TKey, TSource>
structure under the hood, which is a dictionary-line container that contains all the records associated with each key. This structure is generated before starting to compare the elements contained in each grouping, in order to select the maximum element.
In most cases this inefficiency is not a problem, because the records are not that many, and they are already stored in memory. But if you have a truly deferred enumerable sequence that contains a humongous number of elements, you might run out of memory. In this case you could use the GroupMaxBy
operator below. This operator stores in memory only the currently maximum element per key:
/// <summary>
/// Groups the elements of a sequence according to a specified key selector
/// function, and then returns the maximum element in each group according to
/// a specified value selector function.
/// </summary>
public static IEnumerable<TSource> GroupMaxBy<TSource, TKey, TValue>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
Func<TSource, TValue> valueSelector,
IEqualityComparer<TKey> keyComparer = default,
IComparer<TValue> valueComparer = default)
{
// Arguments validation omitted
valueComparer ??= Comparer<TValue>.Default;
var dictionary = new Dictionary<TKey, (TSource Item, TValue Value)>(keyComparer);
foreach (var item in source)
{
var key = keySelector(item);
var value = valueSelector(item);
if (dictionary.TryGetValue(key, out var existing) &&
valueComparer.Compare(existing.Value, value) >= 0) continue;
dictionary[key] = (item, value);
}
foreach (var entry in dictionary.Values)
yield return entry.Item;
}
Usage example:
IEnumerable<Record> query = records
.GroupMaxBy(x => x.GroupName, x => x.MemberValue);
The reverse GroupMinBy
can be implemented similarly by replacing the >=
with <=
.
Below is a demonstration of the difference in memory-efficiency between the two approaches:
var source = Enumerable.Range(1, 1_000_000);
{
var mem0 = GC.GetTotalAllocatedBytes(true);
source.GroupBy(x => x % 1000).Select(g => g.MaxBy(x => x % 3333)).Count();
var mem1 = GC.GetTotalAllocatedBytes(true);
Console.WriteLine($"Allocated: {mem1 - mem0:#,0} bytes");
}
{
var mem0 = GC.GetTotalAllocatedBytes(true);
source.GroupMaxBy(x => x % 1000, x => x % 3333).Count();
var mem1 = GC.GetTotalAllocatedBytes(true);
Console.WriteLine($"Allocated: {mem1 - mem0:#,0} bytes");
}
Output:
Allocated: 8,571,168 bytes
Allocated: 104,144 bytes
Try it on Fiddle.