8

I have a few dictionaries containing similar data.

Most queries would be resolved in a single search of one dictionary.

So is it better performance wise to leave out preliminary checks for existence of a key in a dict and have a try at the next dict in an except clause catching key error?

Or maybe something like

# d1, d2, d3 = bunch of dictionaries

value = d1.get(key, d2.get(key, d3.get(key, 0)))

?

user1561108
  • 2,666
  • 9
  • 44
  • 69
  • 8
    Try it. Python has the great `timeit` module that allows you to asses what is the best, performance wise, comparatively. Do note that performance is far less important than readability, unless it's a proven bottleneck. Write it in the clearest way, then optimise if you need to. It's also worth noting that your first method (checking if the key exists), introduces a potential race condition. – Gareth Latty Dec 13 '12 at 15:03
  • it really depends on your code as well. if the absence of a key in your dictionary is bad, then try:except is more logical, if the absence of a key is trivial then use get with a default `None`, checking if a value is in a dict is not something you really 'need' to do, you should always know what is in your dict, checking would mean you are receiving an unknown dict. – Inbar Rose Dec 13 '12 at 15:05
  • 1
    You're string of `dict.get` will look to see if the key is in `d3`, `d2` and `d1` since to resolve the method call for `d1`, you need to know the arguments that are passed to `get` – mgilson Dec 13 '12 at 15:10
  • 4
    You could use the [Chained map recipe](http://code.activestate.com/recipes/305268/), or, on python 3.3, the new [`ChainMap` type](http://docs.python.org/3/library/collections.html#chainmap-objects). – Martijn Pieters Dec 13 '12 at 15:10
  • FWIW in my simple tests using new found wonder timeit the chained map was faster than the try:except blocks. – user1561108 Dec 13 '12 at 18:18
  • Possible duplicate of [Python: if key in dict vs. try/except](http://stackoverflow.com/questions/4512557/python-if-key-in-dict-vs-try-except) – Michael Ohlrogge Aug 27 '16 at 19:10

6 Answers6

6

It seems in almost all cases, using get would be faster. Here is my test run using try..except and get

>>> def foo1(n):
    spam = dict(zip(range(-99,100,n),[1]*200))
    s = 0
    for e in range(1,100):
        try:
            s += spam[e]
        except KeyError:
            try:
                s += spam[-e]
            except KeyError:
                s += 0
    return s

>>> def foo2(n):
    spam = dict(zip(range(-99,100,n),[1]*200))
    s = 0
    for e in range(1,100):
        s += spam.get(e, spam.get(-e,0))
    return s


>>> for i in range(1,201,10):
    res1 =  timeit.timeit('foo1({})'.format(i), setup = "from __main__ import foo1", number=1000)
    res2 =  timeit.timeit('foo2({})'.format(i), setup = "from __main__ import foo2", number=1000)
    print "{:^5}{:10.5}{:10.5}{:^10}{:^10}".format(i,res1,res2,foo1(i),foo2(i))


  1    0.075102  0.082862    99        99    
 11     0.25096  0.054272    9         9     
 21      0.2885  0.051398    10        10    
 31     0.26211  0.060171    7         7     
 41     0.26653  0.053595    5         5     
 51      0.2609  0.052511    4         4     
 61      0.2686  0.052792    4         4     
 71     0.26645  0.049901    3         3     
 81     0.26351  0.051275    3         3     
 91     0.26939  0.051192    3         3     
 101      0.264  0.049924    2         2     
 111     0.2648  0.049875    2         2     
 121    0.26644  0.049151    2         2     
 131    0.26417  0.048806    2         2     
 141    0.26418  0.050543    2         2     
 151    0.26585  0.049787    2         2     
 161    0.26663  0.051136    2         2     
 171    0.26549  0.048601    2         2     
 181    0.26425  0.050964    2         2     
 191     0.2648  0.048734    2         2     
>>>
Abhijit
  • 62,056
  • 18
  • 131
  • 204
4

Depends on the keys in the dictionaries.

If you predict with confidence that it is more common for keys to be missing use then use get.

If you predict with confidence that it is more common for keys to be there use try except.

Rich Tier
  • 9,021
  • 10
  • 48
  • 71
  • 2
    Please provide supporting evidence for the changing behavior of a ```dict``` throwing a `KeyError` based upon the presence of a key or not. – Cory Dolphin Dec 13 '12 at 15:26
  • Please see my answer, the actual result looks different from your assumption – Abhijit Dec 13 '12 at 15:44
1

Since you say that most of the queries will be resolved by looking at the first dict, your fastest solution would be to do something like:

try:
    item = d1[key]
except KeyError:
    try:
        item = d2[key]
    except KeyError:
        ...

However, that is certainly not the most maintainable of solutions and I don't recommend using it. You could create a function:

def get_from(item,dicts):
    for d in dicts:
        try:
           return d[item]
        except KeyError:
           pass
    else:
        raise KeyError("No item in dicts")

which you would call like:

get_from(key,(d1,d2,d3))

(this is a simplified, slightly less clean, version of the already very simple Chained map recipe suggested by @MartijnPieters in the comments on the original question -- I would advocate using that over this code posted here. This code is only to demonstrate the concept in a more simplified way.)

Finally, perhaps a hybrid solution would work best in practice. Factor the first try out of the loop -- This is a little ugly, but it avoids the overhead of the loop most of the time. Only if the first try raises a KeyError do you enter the loop type solution I suggested above on the remaining dicts. e.g.:

try:
   item = d1[key]
except KeyError:
   item = get_from(key,(d2,d3))

again, only do this if you can reliably demonstrate (think timeit) that it makes a measureable difference


The important thing to know is that in python, try is cheap, but except costs a decent amount of time. If your code is expected to succeed, use try-except. If it isn't expected to succeed, often it's best to use try-except anyway, but in that case, you should evaluate whether performance is really an issue and only if you can demonstrate that it is an issue should you resort to "looking before you leap".

One final note, If the dictionaries are relatively static, it might be worth combining them into 1 dict:

d1.update(d2)
d1.update(d3)

Now you can just use d1 -- It has all the information from d2 and d3. (of course, the order of the updates matters if the dicts have keys that are the same but have different values).

mgilson
  • 300,191
  • 65
  • 633
  • 696
1

try...except usually takes longer than using get but it depends on a few things...

Try making use of the timeit module to test performance in your particular situation like so:

def do_stuff():
    blah

timeit.timeit('testfunc()', 'from __main__ import do_stuff as testfunc')
Sheena
  • 15,590
  • 14
  • 75
  • 113
0

You could as well do

sentinel = object()
values = (d.get(key, sentinel) for d in (d1, d2, d3))
value = next(v for v in values if v is not sentinel)

If none of the dicts contain the key, this raises a StopIteration rather than a KeyError.

glglgl
  • 89,107
  • 13
  • 149
  • 217
  • I thought about this one too -- but the problem is that creating small generator expressions can be a bit expensive. Then again, so can the for loop in my solution ... I'm not sure if there is any optimized way to really do this sort of thing. – mgilson Dec 13 '12 at 15:31
0

The difference between checking the conditional

if 'key' in a_dict or similarly, if a_dct.get('key') == None

and handling the KeyError thrown when not 'key' in a_dict is generally considered trivial, and likely to depend upon the implementation of python you are using.

Using the conditional form is undoubtedly more pythonic, and is generally considered more expressive than catching the exception, often leading to cleaner code. However, if your dictionary may contain arbitrary data, and you cannot know that a value of None, or some other magic value indicates that your key is not found, using the conditional form will require two lookups, as you first check if the key is in the dictionary, and then retrieve the value. I.E.:

if 'key': in a_dict:
   val = a_dcit['key']

Given the situation you describe, the code you have provided is the slowest possible option, as the key will be looked up in each of the dictionaries. A faster option is to guess the dictionary it will be in, and sequentially search through the other dictionaries:

my_val = d1.get(key,None)

if my_val == None:
    my_val = d2.get(key,None)
    if my_val == None:
        my_val = d3.get(key,None)
        if my_val == None:
            return False #handle not found in any case

However, your particular use case sounds interesting and strange. Why are there multiple dictionaries with similar data? How are these dictionaries stored? If you already have a list or some other data structure holding these dictionaries, it would be even more expressive to loop through the dictionaries.

dict_list = [{},{},{}] #pretend you have three dicts in a list

for d in dict_list:
   val = d.get('key',None)
   if val == None:
      break
#val is now either None, or found.
Cory Dolphin
  • 2,650
  • 1
  • 20
  • 30
  • 1
    I don't think that conditional is more pythonic than catching exceptions. You have a race condition here, and using exceptions is EAFP rather than LBYL. – glglgl Dec 13 '12 at 15:22
  • Fair comment with respect to EAFP v.s. LBYL, given http://docs.python.org/2/glossary.html. All of the solutions presented here suffer from race conditions. If threadsafety is an issue, using a lock on the list would suffice to prevent race conditions. – Cory Dolphin Dec 13 '12 at 15:36