2

Reading this question (and answer) I found out that there are at least two ways of get distinct items off an IQueryabe while still getting to choose what to filter by. Those two methods being:

table.GroupBy(x => x.field).Select(x => x.FirstOrDefault());

or using MoreLinqs DistinctBy

table.DistinctBy(x => x.field);

But that thread doesn't explain the performance difference and when I should use the one over the other. So when do I want to use one over the other?

Gilad Green
  • 36,708
  • 7
  • 61
  • 95
  • 7
    [race your horses](https://ericlippert.com/2012/12/17/performance-rant/) Note that MoreLinq's `DistinctBy` works **in memory**, it's not translated to sql and not executed by the db. `GroupyBy` might be implemented by your query provider and executed on the db. – René Vogt Sep 05 '17 at 09:53

1 Answers1

3

There is a very big difference in what they do and thus the performance difference is expected. GroupBy will create a collection for each key in the original collection before passing it to the Select. DistinctBy needs to only keep a hashset with weather it has encountered the key before, so it can be much faster.

If DistinctBy is enough for you always use it, only use GroupBy if you need the elements in each group.

Also for LINQ to EF for example the DistinctBy operator will not work.

mjwills
  • 23,389
  • 6
  • 40
  • 63
Titian Cernicova-Dragomir
  • 230,986
  • 31
  • 415
  • 357
  • So if I have an `IQueryable ` of 1000 items out of which only two are duplicates `GroupBy` will create a collection 999 `collection`? – Hannes Kindströmmer Sep 05 '17 at 11:17
  • Yes is will create `IGrouping`. But you should take care, `IQueryable`, will probably not have the `DistincyBy` operator implemented for Entity Framework, so it will fail at runtime. This discussion applies for in memory LINQ. For EF the `DistinctBy` operator implementations I've seen actually use `GroupBy` under the hood and will not have any performance difference (for ex: https://stackoverflow.com/questions/32619338/ef-distinctby-on-an-iqueryable) – Titian Cernicova-Dragomir Sep 05 '17 at 11:24
  • DistinctBy is a 3rd party lib – hanzolo May 20 '21 at 17:12