237

Imagine the code:

public class obj
{
    // elided
}

public static Dictionary<string, obj> dict = new Dictionary<string, obj>();

Method 1

public static obj FromDict1(string name)
{
    if (dict.ContainsKey(name))
    {
        return dict[name];
    }
    return null;
}

Method 2

public static obj FromDict2(string name)
{
    try
    {
        return dict[name];
    }
    catch (KeyNotFoundException)
    {
        return null;
    }
}

I was curious if there is a difference in performance of these 2 functions, because the first one SHOULD be SLOWER than second one - given that it needs to check twice if the dictionary contains a value, while second function does need to access the dictionary only once but WOW, it's actually opposite:

Loop for 1 000 000 values (with 100 000 existing and 900 000 non existing):

first function: 306 milliseconds

second function: 20483 milliseconds

Why is that?

EDIT: As you can notice in comments below this question, the performance of second function is actually slightly better than first one in case there are 0 non existing keys. But once there is at least 1 or more non existing keys, the performance of second one decrease rapidly.

Wai Ha Lee
  • 8,598
  • 83
  • 57
  • 92
Petr
  • 13,747
  • 20
  • 89
  • 144
  • 39
    Why the first one *should* be slower? Actually, at the first glance, I'd say it should be faster, `ContainsKey` is expected `O(1)` ... – Patryk Ćwiek Apr 19 '13 at 09:49
  • 3
    http://msdn.microsoft.com/en-us/library/vstudio/ms229009(v=vs.100).aspx – Habib Apr 19 '13 at 09:51
  • See http://stackoverflow.com/a/52390/759019 – Anirudh Ramanathan Apr 19 '13 at 09:52
  • @Trustme-I'maDoctor because even if it's O(1) it cost some instructions, while the second function doesn't call ContainsKey at all (I thought that accessing dict once is faster than accessing it twice) – Petr Apr 19 '13 at 10:02
  • 8
    @Petr There are a lot more instructions involved in the exception throwing than `O(1)` lookup in the dictionary... Especially since doing two `O(1)` operations is still asymptotically `O(1)`. – Patryk Ćwiek Apr 19 '13 at 10:18
  • 2
    Is it still as slow if you don't have a debugger attached? I've previously found that just attaching a debugger can slow down exception code dramatically. – Jonathan Apr 19 '13 at 12:29
  • 9
    As has been noted in the good answer below, throwing exceptions is expensive. Their name suggests this: they are meant to be reserved for **exception**-al circumstances. If you're running a loop where you query a dictionary a million times for keys that don't exist, then it sort of ceases to be an exceptional circumstance. If you're querying a dictionary for keys, and it's a relatively common case that they key won't be present, then it makes sense to check first. – Jason R Apr 19 '13 at 14:35
  • @Jonathan this test was done with debugger turned off – Petr Apr 20 '13 at 07:22
  • 6
    Don't forget that you've only compared the cost of checking for a million absent values, vs. throwing a million exceptions. But the two methods also differ in the cost of accessing an **existing** value. If missing keys are rare enough, the exception method will be faster over all, despite its higher cost _when_ a key is absent. – alexis Apr 20 '13 at 11:39
  • 1
    +1 for supporting your question with code and metrics! – Dan Solovay Apr 23 '13 at 20:19

2 Answers2

409

On the one hand, throwing exceptions is inherently expensive, because the stack has to be unwound etc.
On the other hand, accessing a value in a dictionary by its key is cheap, because it's a fast, O(1) operation.

BTW: The correct way to do this is to use TryGetValue

obj item;
if(!dict.TryGetValue(name, out item))
    return null;
return item;

This accesses the dictionary only once instead of twice.
If you really want to just return null if the key doesn't exist, the above code can be simplified further:

obj item;
dict.TryGetValue(name, out item);
return item;

This works, because TryGetValue sets item to null if no key with name exists.

Daniel Hilgarth
  • 171,043
  • 40
  • 335
  • 443
  • 4
    I updated my test according to answer, and for some reason, despite the suggested function IS faster, it's actually not very significant: 264 ms original, 258ms suggested one – Petr Apr 19 '13 at 09:58
  • 52
    @Petr: Yes, it's not significant, because accessing the dictionary is very fast, it doesn't really matter if you do it once or twice. Most of those 250 ms most likely is spent in the test loop itself. – Daniel Hilgarth Apr 19 '13 at 10:00
  • 4
    This is good to know, because sometimes one gets the impression that exception-throwing is a better or cleaner way to handle a situation like nonexistent file or null pointer, regardless of whether those situations are common, and without considering the performance cost. – LarsH Apr 19 '13 at 18:37
  • 4
    @LarsH it also depends what you're doing. While simple microbenchmarks like this show really large penalties for exceptions once your loops start including file or database activities throwing an exception on every iteration matters very little for performance. Compare 1st and 2nd table: http://www.codeproject.com/Articles/11265/Performance-implications-of-Exceptions-in-NET – Dan Is Fiddling By Firelight Apr 19 '13 at 21:12
  • To be honest, I wonder why you'd do `!xxx.TryGetValue... return null` instead of `xxx.TryGetValue... return item`, seems to me that returning null should be the last action in a method, returning only item if it found something. Of course this'd be different if you had a longer method and you were (trying to) reduce the indentation. – Destrictor Apr 20 '13 at 00:16
  • @Destrictor: I don't understand what you are saying here. Why should `return null` be the last action in a method? This leads to heavy intendation. It is better to do it the other way around: return null as soon as one of the conditions doesn't hold. But in this concrete example, assuming that the method is only what you see here, it really is only a matter of preference. It makes no difference which way you do it. – Daniel Hilgarth Apr 21 '13 at 09:50
  • That the lookup is O(1) says absolutely nothing about how long it takes only that the time is not affected by the number of elements – Rune FS Apr 23 '13 at 18:33
  • @RuneFS: That's correct. That's why my answer says *it's a fast, O(1) operation*, i.e. it is fast and it stays that way, no matter how many elements there are. – Daniel Hilgarth Apr 23 '13 at 18:40
  • 8
    @LarsH Do also note that when trying to access a file (or some other external resource), it may change state between the check and actual access attempt. In these cases, using exceptions is the correct way to go. See [Stephen C's answer to this question](http://stackoverflow.com/questions/6092992/why-is-it-easier-to-ask-forgiveness-than-permission-in-python-but-not-in-java) for additional insight. – yoniLavi Apr 23 '13 at 21:21
  • To be honest I misread as "fast because it's O(1)" and I agree that in _most_ cases it's correct that it's fast but to nitpick it's not guaranteed since we don't know the implementation of GetHashKey – Rune FS Apr 24 '13 at 06:10
  • @RuneFS: Yes, that's technically correct and yes, that's nitpicking :-) According to the contract, an implementation of `GetHashCode` needs to be quick. – Daniel Hilgarth Apr 24 '13 at 08:15
6

Dictionaries are specifically designed to do super fast key lookups. They are implemented as hashtables and the more entries the faster they are relative to other methods. Using the exception engine is only supposed to be done when your method has failed to do what you designed it to do because it is a large set of object that give you a lot of functionality for handling errors. I built an entire library class once with everything surrounded by try catch blocks once and was appalled to see the debug output which contained a seperate line for every single one of over 600 exceptions!

  • 1
    When language implementors are deciding where to expend efforts at optimization, hash tables will get priority because they're used frequently, often in inner loops that may be bottlenecks. Exceptions are expected only to be used much less frequently, in unusual ("exceptional", so to speak) cases, so they're not usually considered as important for performance. – Barmar Apr 24 '13 at 17:45
  • "They are implemented as hashtables and the more entries the faster they are relative to other methods." surely that is not true if the buckets fill up?!?! – AnthonyLambert Jul 12 '13 at 13:23
  • 1
    @AnthonyLambert What he's trying to say is that searching a hashtable has O(1) time complexity, while a binary search tree search would have O(log(n)); the tree slows down as the number of elements increases asymptotically, while the hashtable doesn't. Therefore, the speed advantage of the hashtable increases with the number of elements, although it does so slowly. – Doval Oct 14 '13 at 17:15
  • @AnthonyLambert Under normal use, there are extremely few collisions in a Dictionary's hashtable. If you're using a hashtable and your buckets fill up, you have waaaaay too many entries (or too few buckets). In that case, it's time to use a custom hashtable. – AndrewS Mar 09 '14 at 03:39