21

I'm making a video game where performance is critical.

I'm using the .Distinct() extension method to get unique value from a List. Is there a faster way to do so? (even if it means having many more lines of code)

Daniel Mann
  • 57,011
  • 13
  • 100
  • 120
Vittorio Romeo
  • 90,666
  • 33
  • 258
  • 416
  • It would help to have a lot more background... – soandos May 11 '11 at 21:44
  • 1
    You need more details - do you have control over how the list is generated? You could do better if you never inserted duplicate elements into the list in the first place... – Ana Betts May 11 '11 at 21:44
  • 1
    `.Distinct` _is_ faster. Have you profiled? – SLaks May 11 '11 at 21:44
  • 10
    There are many aspects to game performance. For example, Distinct is pretty fast but it also makes garbage. And on the XBOX version of the framework, garbage makes for garbage collections, which can pause the thread updating the UI. My concern with Distinct would not be its small cost in time, but rather its large cost in collection pressure. Can you describe in much more detail what your performance budget is, in terms of both memory and time? – Eric Lippert May 11 '11 at 22:00

3 Answers3

40

.Distinct is an O(n) call.
You can't get any faster than that.

However, you should make sure that your GetHashCode (and, to a lesser extent, Equals) is as fast as possible.

Depending on your scenario, you may be able to replace the List<T> with a HashSet<T>, which will prevent duplicates from being inserted in the first place. (yet has O(1) insertion)

However, Always profile your code before jumping to conclusions about what needs to be faster.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
7

Does it have to be a List?

Would it be possible to switch from List, to HashSet? HashSet prevents objects from being inserted into the list more than once in the first place, so the Distinct is already done.

David Yaw
  • 27,383
  • 4
  • 60
  • 93
1

If you can do the distinct in place, you can do it very quickly and with zero allocations by first using Array.Sort and then:

 TSource oldV = source[0];
 int pos = 1;
 for (int i = 1; i < source.Count; i++)
 {
     var newV = source[i];
     source[pos] = newV;
     if (!eqComparer.Equals(newV, oldV))
     {
        pos++;
     }                
     oldV = newV;
 }
 //pos now == the new size of the array

You will then have to keep track of the now smaller size of the array, or use Array.resize (But that will allocate a new array)

Alternatively if you do this same approach with a List<T> you can call RemoveRange at the end to resize it without allocating. This ends up being significantly quicker.

Other posters are probably correct though that you can achieve this goal some other way, such as using a hashset in the first place, or keeping parallel collections where one contains only the distinct elements all the time. Offsetting small costs on insert/remove so that no time at all is required to get the distinct set.

jackmott
  • 1,112
  • 8
  • 16