You've asked for a straightforward solution to the problem, and the GroupBy
+Where
+Select
solutions satisfy perfectly this requirement, but you might also be interested for a highly performant and memory-efficient solution. Below is an implementation that uses all the tools that are currently available (.NET 6+) for maximum efficiency:
/// <summary>
/// Returns a sequence of elements that appear exactly once in the source sequence,
/// according to a specified key selector function.
/// </summary>
public static IEnumerable<TSource> UniqueBy<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector,
IEqualityComparer<TKey> comparer = default)
{
ArgumentNullException.ThrowIfNull(source);
ArgumentNullException.ThrowIfNull(keySelector);
Dictionary<TKey, (TSource Item, bool Unique)> dictionary = new(comparer);
if (source.TryGetNonEnumeratedCount(out int count))
dictionary.EnsureCapacity(count); // Assume that most items are unique
foreach (TSource item in source)
CollectionsMarshal.GetValueRefOrAddDefault(dictionary, keySelector(item),
out bool exists) = exists ? default : (item, true);
foreach ((TSource item, bool unique) in dictionary.Values)
if (unique)
yield return item;
}
The TryGetNonEnumeratedCount
+EnsureCapacity
combination can have a significant impact on the amount of memory allocated during the enumeration of the source, in case the source is a type with well known size, like a List<T>
.
The CollectionsMarshal.GetValueRefOrAddDefault
ensures that each key will be hashed only once, which can be impactful in case the keys have expensive GetHashCode
implementations.
Usage example:
List<MyClass> unique = myClassObject.UniqueBy(x => x.BillId).ToList();
Online demo.
The difference of the above UniqueBy
from the built-in DistinctBy
LINQ operator, is that the former eliminates completely the duplicates altogether, while the later preserves the first instance of each duplicate element.