I believe this should outperform the GroupBy
suggestion (see brief explanation below):
IEnumerable<Event> DistinctEvents(IEnumerable<Event> events)
{
var dict = new Dictionary<int, Event>();
foreach (Event e in events)
{
Event existing;
if (!dict.TryGetValue(e.Id, out existing) || e.ExpirationTime < existing.ExpirationTime)
{
dict[e.Id] = e;
}
}
foreach (Event e in dict.Values)
{
yield return e;
}
}
Explanation: While this and the GroupBy
method proposed by Ani have the same algorithmic complexity (as far as I can tell, anyway), the above approach is more efficient in practice for two reasons.
GroupBy
internally uses a Lookup<TKey, TValue>
(very similar to a Dictionary<TKey, List<TValue>>
) which actually populates internal collections with the contents of the input sequence. This requires more memory and also has a performance impact, particularly due to the fact that while the sub-collections will have amortized O(1) insertion time, they will occasionally need to resize themselves, which will be O(N) (where N is the size of the sub-collection). This is not a big deal, but it's still a lot more work you really need to be doing.
- A consequence of point #1 is that this in turn requires iterating over each element in the input sequence before
GroupBy
can provide an enumerator (so it's deferred execution, but then the entire input sequence needs to be iterated before iterating over the result of GroupBy
). Then you're iterating over each group again in the call to Aggregate
; so in all, you're iterating over the elements in the input sequence twice, which is more times than necessary to accomplish the task at hand.
As I said, the algorithmic complexity is the same, which means the two approaches should be equally scalable; this one is simply faster. I took the liberty of testing both approaches (out of curiosity, mostly) and found the above to execute in roughly half the time and cause fewer GC collections (a rough approximation of memory usage) than the GroupBy
approach.
These are minute concerns, which it would normally be a waste of time to think too much about. The only reason I mention them is that you asked for an efficient solution (and even bolded the term); so I figured you would want to take these kinds of factors into consideration.