36

I'm writing a 7 card poker hand evaluator as one of my pet projects. While trying to optimize its speed (I like the challenge), I was shocked to find that the performance of Dictionary key lookups was quite slow compared to array index lookups.

For example, I ran this sample code that enumerates over all 52 choose 7 = 133,784,560 possible 7 card hands:

var intDict = new Dictionary<int, int>();
var intList = new List<int>();
for (int i = 0; i < 100000; i ++)
{
    intDict.Add(i, i);  
    intList.Add(i);
}

int result;

var sw = new Stopwatch();
sw.Start();
for (int card1 = 0; card1 < 46; card1++)
  for (int card2 = card1 + 1; card2 < 47; card2++)
    for (int card3 = card2 + 1; card3 < 48; card3++)
      for (int card4 = card3 + 1; card4 < 49; card4++)
        for (int card5 = card4 + 1; card5 < 50; card5++)
          for (int card6 = card5 + 1; card6 < 51; card6++)
            for (int card7 = card6 + 1; card7 < 52; card7++)
              result = intDict[32131]; // perform C(52,7) dictionary key lookups
sw.Stop();
Console.WriteLine("time for dictionary lookups: {0} ms", sw.ElapsedMilliseconds);

sw.Reset();

sw.Start();
for (int card1 = 0; card1 < 46; card1++)
  for (int card2 = card1 + 1; card2 < 47; card2++)
    for (int card3 = card2 + 1; card3 < 48; card3++)
      for (int card4 = card3 + 1; card4 < 49; card4++)
        for (int card5 = card4 + 1; card5 < 50; card5++)
          for (int card6 = card5 + 1; card6 < 51; card6++)
            for (int card7 = card6 + 1; card7 < 52; card7++)
              result = intList[32131]; // perform C(52,7) array index lookups
sw.Stop();
Console.WriteLine("time for array index lookups: {0} ms", sw.ElapsedMilliseconds);

which outputs:

time for dictionary lookups: 2532 ms
time for array index lookups: 313 ms

Is this type of behavior expected (performance decrease by a factor of 8)? IIRC, a Dictionary has, on average, O(1) lookups, while an array has worst-case O(1) lookups, so I do expect the array lookups to be faster, but not by this much!

I am currently storing poker hand rankings in a Dictionary. I suppose if this is as fast as the dictionary lookups can be, I have to rethink my approach and use arrays instead, although indexing the rankings will get a little tricky and I'll probably have to ask another question about it.

snazzer
  • 449
  • 1
  • 5
  • 10
  • 23
    f the dictionary lookup always takes 1 hour to complete for a single item, and *always* one hour, it's still O(1), even if another type of lookup, like the array-lookup, takes 1ms, and is also O(1). You can only use big-O notation to compare complexity, do not substitute it for actually measuring the runtime performance characteristics of your code. – Lasse V. Karlsen May 25 '09 at 21:19
  • I don't think complexity is not the correct word, Big O notation tells you how it scales with the number of items. O(1) tells you that the lockups are constant, that is as the collection grows the lookup don't take longer. However a lookup can have very high complexity and still be O(1) like in the case of dictionary, or they can have very low complexity and be O(1) like and array lookup. – trampster Oct 22 '18 at 20:36
  • 1
    I'd be curious to see how your approach compares to my OneJoker library. It's optimized for 5-card hands, but can do 7, and my lookup tables are only about 1MB, while yours probably approach a GB. – Lee Daniel Crocker Jan 24 '19 at 02:02

7 Answers7

68

Don't forget that Big-O notations only says how the complexity grows with respect to the size (etc) - it doesn't give any indication of the constant factors involved. That's why sometimes even a linear search for keys is faster than a dictionary lookup, when there are sufficiently few keys. In this case you're not even doing a search with the array though - just a straight indexing operation.

For straight index lookups, arrays are basically ideal - it's just a case of

pointer_into_array = base_pointer + offset * size

(And then a pointer dereference.)

Performing a dictionary lookup is relatively complicated - very fast compared with (say) a linear lookup by key when there are lots of keys, but much more complicated than a straight array lookup. It has to calculate the hash of the key, then work out which bucket that should be in, possibly deal with duplicate hashes (or duplicate buckets) and then check for equality.

As always, choose the right data structure for the job - and if you really can get away with just indexing into an array (or List<T>) then yes, that will be blindingly fast.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • I wanted to ask a question about this, but didn't feel it warranted its own question. If strings are immutable, is the hashcode calculated once and stored internally, or is it recalculated every time GetHashCode() is called? – Chris Doggett May 27 '09 at 01:50
  • Chris: In .NET, it's recalculated every time. In Java it's cached. That's on a per instance basis, but of course a hashtable *does* remember the hash for each key. – Jon Skeet May 27 '09 at 05:17
  • Are dictionary lookups faster than Linq to datasets lookups? – LB. Oct 08 '09 at 20:15
  • Is there any keys limit (in python) to prefer a linear search over a key-value search in a dictionary? – Jdeep Jul 16 '20 at 04:44
  • @NoahJ.Standerson: I don't know enough Python to give an estimate, but it may well depend on the cost of hashing the key as well as anything else. As ever for performance things, measure it with realistic data. – Jon Skeet Jul 16 '20 at 07:52
8

Is this type of behavior expected (performance decrease by a factor of 8)?

Why not? Each array lookup is almost intantaneous/negligeable, whereas a dictionary lookup may need at least an extra subroutine call.

The point of their both being O(1) means that even if you have 50 times more items in each collection, the performance decrease is still only a factor of whatever it is (8).

ChrisW
  • 54,973
  • 13
  • 116
  • 224
  • 1
    Not quite. There shouldn't be a performance decrease at all! And that is the **real point** of being O(1). Of the dictionary does have a little footnote which says: if the hashing function produces too many duplicates, or if there are too few buckets, then performance will degrade. – Disillusioned Sep 24 '13 at 23:22
  • @CraigYoung I meant, performance difference between one implementation (array) and the other (dictionary) should remain constant (a factor of 8) independent of size. – ChrisW Sep 25 '13 at 08:24
  • I understood your point. I believe you misunderstood mine. To claim any kind of performance decrease for an O(1) algorithm is simply **wrong**! Furthermore, the property you attribute to two different algorithms that are O(1) is true for **any** two algorithms of the **same Order class**. (I.e. 2 O(n^2) algorithms also differ by a constant factor independent of size; same for 2 O(n!) etc.). So you have totally missed the **real** point of anything being O(1). – Disillusioned Sep 25 '13 at 21:54
  • @CraigYoung I'm not trying to say that performance decreases with size. If you have two implementations (array and dictionary), and if they're both O(1), and if the dictionary is 8 times slower than the array when they both contain e.g. 10 items, then I predicted that the dictionary will still be 8 times slower than the array when they both contain 100 or 1000 items. – ChrisW Sep 25 '13 at 23:11
  • You might not be trying to say that performance decreases, but that is exactly what your answer says. Quote: "if you have 50 times more items in each collection, the performance **decrease** is". Suppose performance of dictionary is 8ms and array is 1ms for 100 items, then performance will be 8ms and 1ms respectively for 1000 items. Or 8ms and 1ms for 1000000 items! (It's somewhat blatantly obvious that if performance doesn't change at all, then the dictionary "still" be 8 times slower.) – Disillusioned Sep 26 '13 at 06:18
  • 4
    @CraigYoung The word "decrease" is in the OP and quoted in my answer. The 'decrease' in question is the difference in speed between an array and a dictionary: not a difference in speed when more items are stored. – ChrisW Sep 26 '13 at 09:12
7

Something could take a millenium, and still be O(1).

If you single-step through this code in the disassembly window, you will quickly come to understand what the difference is.

Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • 4
    Yes. An easy way to write an O(1) algorithm is to precalculate the worst time (ie for maximum of n), and then run it for the same time even for easiest element (eg, first element) of the collection. Collection remains O(1). You might also be fired from the company in O(1) :) – nawfal Jun 10 '14 at 07:51
4

Dictionary structures are most useful when the key space is very large and cannot be mapped into a stable, sequenced order. If you can convert your keys into a simple integer in a relatively small range, you will be hard-pressed to find a data structure that will perform better than an array.

On an implementation note; in .NET, dictionaries are essentially hashables. You can somewhat improve their key-lookup performance by ensuring that your keys hash into a large space of unique values. It looks like in your case, you are using a simple integer as a key (which I believe hashes to its own value) - so that may be the best you can do.

LBushkin
  • 129,300
  • 32
  • 216
  • 265
3

An array lookup is about the fastest thing you can do - essentially all it is is a single bit of pointer arithmetic to go from the start of the array to the element you wanted to find. On the other hand, the dictionary lookup is likely to be somewhat slower since it needs to do hashing and concern itself with finding the correct bucket. Although the expected runtime is also O(1) - the algorithmic constants are greater so it will be slower.

1800 INFORMATION
  • 131,367
  • 29
  • 160
  • 239
2

Welcome to Big-O notation. You always have to consider that there is a constant factor involved.

Doing one Dict-Lookup is of course much more expensive than an array lookup.

Big-O only tells you how algorithms scale. Double the amount of lookups and see how the numbers change: Both should take around the twice time.

ebo
  • 8,985
  • 3
  • 31
  • 37
  • 3
    Not quite. Doubling the number of lookups for an O(n) search algorithm would also cause it to take twice the time. Big-O notation does tell you how the algorithm scales. But it's not about the amount of lookups, it's about the amount data you search against.... So: if you double the amount of **data**: both O(1) lookups should take about the same time as before. But an O(n) lookup should take about twice as long as before. – Disillusioned Sep 24 '13 at 23:28
1

The cost of retrieving an element from a Dictionary is O(1), but that's because a dictionary is implemented as a hashtable - so you have to first calculate the hash value to know which element to return. Hashtables are often not that efficient - but they are good for large datasets, or datasets that have a lot of unique-hash values.

The List (apart from being a rubbish word used to dercribe an array rather than a linked list!) will be faster as it will return the value by directly calculating the element you want returned.

gbjbaanb
  • 51,617
  • 12
  • 104
  • 148