131
HashSet<T> t = new HashSet<T>();
// add 10 million items


Dictionary<K, V> t = new Dictionary<K, V>();
// add 10 million items.

Whose .Contains method will return quicker?

Just to clarify, my requirement is I have 10 million objects (well, strings really) that I need to check if they exist in the data structure. I will NEVER iterate.

nawfal
  • 70,104
  • 56
  • 326
  • 368
halivingston
  • 3,727
  • 4
  • 29
  • 42
  • 2
    **Step 1:** See if both do the same thing (in this case, the two collections are for different purposes) **Step 2:** Refer documentation and see if you feel good about their asymptotic complexity. **Step 3:** If you feel you need to worry more, measure yourself and then ask the question posting the benchmark along with it. *In your case the question becomes pointless in the first step.* – nawfal May 29 '14 at 19:37

5 Answers5

190

HashSet vs List vs Dictionary performance test, taken from here.

Add 1000000 objects (without checking duplicates)

Contains check for half the objects of a collection of 10000

Remove half the objects of a collection of 10000

Bernhard Barker
  • 54,589
  • 14
  • 104
  • 138
had
  • 2,068
  • 1
  • 13
  • 10
  • 10
    Great analysis! It looks like the .Contains for Dictionary is so fast that there is no benefit from using HashSet at all, in the OP's case. – EtherDragon Jul 27 '12 at 23:03
  • 2
    yeah, i had the same question as the OP. I already have a dictionary i'm using for other reasons, and wanted to know if i benefit from changing to a Hashset instead of using ContainsKey. Looks like the answer is no since both are so fast. – FistOfFury Sep 12 '12 at 19:15
  • 6
    Contrary to what the previous comments seem to imply, yes, you should switch to HashSet because it gives you what you want: storing a set of values (as opposed to maintaining some kind of mapping). This answer indicates that there will be no negative impact on performance compared to Dictionary. – Francois Beaussier Apr 07 '17 at 06:38
  • 6
    This answer does NOT tell you how perfromance of HashSet and Dictionary compare ... all it tells you is that they're both faster than a List .. well ... yeah! Obviously! HashSet could be 3 times faster and you wouldn't know because the relevant test has collapsed both down to "they're instantaneous ... ***compared to a List***". – Brondahl Jul 26 '20 at 20:14
  • What about getting the value at an index/key? – M. Azyoksul Nov 12 '21 at 16:32
  • I know it's not necessarily important in this context, but I am curious what the units are for these tests. I am assuming it's milliseconds? – Sal Alturaigi Oct 09 '22 at 11:52
80

I assume you mean Dictionary<TKey, TValue> in the second case? HashTable is a non-generic class.

You should choose the right collection for the job based on your actual requirements. Do you actually want to map each key to a value? If so, use Dictionary<,>. If you only care about it as a set, use HashSet<>.

I would expect HashSet<T>.Contains and Dictionary<TKey, TValue>.ContainsKey (which are the comparable operations, assuming you're using your dictionary sensibly) to basically perform the same - they're using the same algorithm, fundamentally. I guess with the entries in Dictionary<,> being larger you end up with a greater likelihood of blowing the cache with Dictionary<,> than with HashSet<>, but I'd expect that to be insignificant compared with the pain of choosing the wrong data type simply in terms of what you're trying to achieve.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • Yes, I meant Dictionary. I'm only concerned about searching for item's existence in a data structure, that is *all*. – halivingston Apr 28 '10 at 10:21
  • 6
    @halivingston In that case use HashSet. It makes it obvious that that *is* all you need. – Jon Skeet Apr 28 '10 at 10:30
  • 2
    Ok, thanks. I actually have a HashSet right now, and a duplicate copy of Dictionary also in memory. I first .Contains on the HashSet, then retrive the value in Dictionary. I have infinite memory right now, but soon I fear my memory will be constrained and our team will ask me to remove this duplicate stuff in memory, at which point I'll be forced to use Dictionary. – halivingston Apr 28 '10 at 10:35
  • 4
    You do know Dictionary has a ContainsKey function too right? Why are you duplicating data? – Blindy Apr 28 '10 at 10:45
  • 60% of the time there will be a miss, so I'm thinking I can double my memory cost. Maybe I should reconsider and just use Dictionary. – halivingston Apr 28 '10 at 11:22
  • 8
    If you already have the data in the dictionary, then your first comment is clearly incorrect - you need to associate keys with values as well. Maybe not for *this* particular bit of code, but that's irrelevant. If you've already got a `Dictionary` for other reasons, you should use that. – Jon Skeet Apr 28 '10 at 12:10
  • Surely the first time I see a post from @JonSkeet that actually doesn't answer the question. I am not saying your answer isn't useful because you saw that OP was misunderstanding something, but still, his question "Whose .Contains method will return quicker?" is not addressed here. I think the answer is : HashSet.Contains() is faster : see [Dictionary](https://msdn.microsoft.com/en-us/library/kw5aaea4%28v=vs.110%29.aspx) and [HashSet](https://msdn.microsoft.com/en-us/library/bb356440%28v=vs.110%29.aspx) remarks section, where it's stated that HashSet is O(1) and Dictionary only approaches O(1) – Luke Marlin Mar 23 '15 at 15:09
  • @LukeMarlin: That's a documentation issue, very easily provably - try creating a HashSet using non-equal elements which have the same hash code (easily done) and you'll find it's O(n), just like Dictionary. Basically the difference between Dictionary and HashSet is whether each entry has a value or not - that's it. – Jon Skeet Mar 23 '15 at 15:13
  • @JonSkeet Then the answer to his question is that it depends on the nature of the item set that is considered since hashset might go up to O(n) in worst cases (didn't think of that but it makes sense). I still think it's useful information regarding his original question, and I'm probably not alone since the benchmark answer is the top-voted one (not by much, though). – Luke Marlin Mar 23 '15 at 15:29
  • @LukeMarlin: No, it doesn't depend on the nature of the item set, because I'd expect both Dictionary and HashSet to behave the same way for the same set of keys/items. The benchmark answer doesn't really answer the "which takes longer for contains" though - because the values are 0 in each case. (Dictionary.ContainsValue is red herring, IMO.) I'm also somewhat nervous of just accepting the numbers when there's no source code that I can see... – Jon Skeet Mar 23 '15 at 15:32
  • @LukeMarlin: Will clarify that the performance should be the same thouhg. – Jon Skeet Mar 23 '15 at 15:33
11

From MSDN documentation for Dictionary<TKey,TValue>

"Retrieving a value by using its key is very fast, close to O(1), because the Dictionary class is implemented as a hash table."

With a note:

"The speed of retrieval depends on the quality of the hashing algorithm of the type specified for TKey"

I know your question/post is old - but while looking for an answer to a similar question I stumbled across this.

Hope this helps. Scroll down to the Remarks section for more details. https://msdn.microsoft.com/en-us/library/xfhwa508(v=vs.110).aspx

ripvlan
  • 460
  • 6
  • 14
5

The accepted answer to this question does NOT validly answer the question! It happens to give the correct answer, but that answer isn't shown by the evidence they provided.

What that answer shows is that Key lookups on a Dictionary or HashSet are vastly quicker than looking up in a List. Which is true, but not interesting, nor surprising, nor proof that they have the same speed.

I've run the code below to compare the lookup times, and my conclusion is that they ARE in fact the same speed. (Or at least, if there is any difference, then the difference is well within the Standard Deviation of that speed)

Specifically, 100,000,000 lookups was taking between 10 and 11.5 seconds for both, for me, in this test.

Test Code:

private const int TestReps = 100_000_000;
[Test]
public void CompareHashSetContainsVersusDictionaryContainsKey()
{
    for (int j = 0; j < 10; j++)
    {
        var rand = new Random();
        var dict = new Dictionary<int, int>();
        var hash = new HashSet<int>();

        for (int i = 0; i < TestReps; i++)
        {
            var key = rand.Next();
            var value = rand.Next();
            hash.Add(key);
            dict.TryAdd(key, value);
        }

        var testPoints = Enumerable.Repeat(1, TestReps).Select(_ => rand.Next()).ToArray();
        var timer = new Stopwatch();
        var total = 0;
        
        timer.Restart();
            for (int i = 0; i < TestReps; i++)
            {
                var newKey = testPoints[i];
                if (hash.Contains(newKey))
                {
                    total++;
                }
            }
        Console.WriteLine(timer.Elapsed);
        
        var target = total;
        Assert.That(total == target);
        

        timer.Restart();
            for (int i = 0; i < TestReps; i++)
            {
                var newKey = testPoints[i];
                if (dict.ContainsKey(newKey))
                {
                    total++;
                }
            }
        Console.WriteLine(timer.Elapsed);

        Assert.That(total == target * 2);
        Console.WriteLine("Set");
    }
}
Brondahl
  • 7,402
  • 5
  • 45
  • 74
4

These are different data structures. Also there is no generic version of HashTable.

HashSet contains values of type T which HashTable (or Dictionary) contains key-value pairs. So you should choose collection on what data you need to be stored.

Andrew Bezzub
  • 15,744
  • 7
  • 51
  • 73