-3

I am looking for a comparison/performance considerations between a list of integers against a hash set of integers. This is what What is the difference between HashSet<T> and List<T>? talks about for T as integer.

I will have up to several thousand integers, and I want to find out, for individual integers, whether they are contained in this set.

Now of course this screams for a hash set, but I wonder whether hashing is beneficial here, since they are just integers to start with. Would hashing them first not add unnecessary overhead here?

Or in other words: Is using a hash set beneficial, even for sets of integers?

Marcel
  • 15,039
  • 20
  • 92
  • 150
  • 7
    [Race your horses](https://ericlippert.com/2012/12/17/performance-rant/) – René Vogt Apr 27 '20 at 08:37
  • https://learn.microsoft.com/en-us/dotnet/api/system.collections.generic.hashset-1?view=netcore-3.1#remarks – ProgrammingLlama Apr 27 '20 at 08:38
  • 3
    _"Now of course this screams for a hash set, but I wonder whether hashing is beneficial here, since they are just integers to start with"_ suggests a lack of understanding how lookups differ between the two. [This answer](https://stackoverflow.com/a/6391896/3181933) or perhaps [this one](https://stackoverflow.com/a/6391813/3181933) from the question you linked seems to have the answer you're looking for. – ProgrammingLlama Apr 27 '20 at 08:43
  • @RenéVogt I have, in the meantime, for a specific case, but I was looking for more comprehensive information. I have a set of 38317 integers and I am doing 9 lookups. This gave an improvement of about 22% in my case. (Measured with Visual Studio 2017 Profiler). – Marcel Apr 27 '20 at 08:58
  • 2
    You might want to check the integer implementation of [`GetHashCode`](https://referencesource.microsoft.com/#mscorlib/system/int32.cs,86) – Damien_The_Unbeliever Apr 27 '20 at 09:02
  • @Marcel Why wouldn't a HashSet be beneficial for integers? If you add strings to a `HashSet`, they are partitioned using the `.GetHashCode()` method, and then when you do `.Contains()`, etc. with a string, it first looks it up with the hashcode and checks for equality. With integers the difference is that the hashCode == the integer value, although that's an implementation detail. Either way, it works in the same way. – ProgrammingLlama Apr 27 '20 at 09:02
  • @Damien_The_Unbeliever This actually answers my question. So, no cpu cycles are lost for hashing, because there is no hashing. :-o – Marcel Apr 27 '20 at 10:12
  • People really dislike performance questions for some reason. – Theodor Zoulias Apr 28 '20 at 05:03
  • 1
    @TheodorZouliasYeah. And I guess it's a bit symptomatic that the question is actually only bad worded, (my bad) and not really about the actual performance of code. – Marcel Apr 28 '20 at 08:33

2 Answers2

5

Hashing an integer is very cheap, as you can see in the source code of the Int32.GetHashCode method:

// The absolute value of the int contained.
public override int GetHashCode()
{
    return m_value;
}

The hash of the number is the number itself. It can't get any cheaper than that. So there is no reason to be concerned about the overhead. Put your numbers in a HashSet, and enjoy searching with O(1) computational complexity.

Theodor Zoulias
  • 34,835
  • 7
  • 69
  • 104
-1

What ever T is there is a simple but efficient rule of thumb:

  • The collection is mainly used for adding and iterating with very few search => Use List

  • The collection is heavely used for research => Use HashSet

Khaled
  • 317
  • 2
  • 7
  • Yes I know, but my question comes out of the fact that the hash actually is an integer anyway. – Marcel Apr 27 '20 at 12:06