9

This isn't premature optimization. My use case has the double-checking of dict's right in the inner-most of inner loops, running all the time. Also, it's intellectually irksome (see results).

Which of these approaches is faster?

mydict = { 'hello': 'yes', 'goodbye': 'no' }
key = 'hello'

# (A)
if key in mydict:
    a = mydict[key]
    do_things(a)
else:
    handle_an_error()

# vs (B)
a = mydict.get(key,None)
if a is not None:
    do_things(a)
else:
    handle_an_error()

Edit: these are the same speed. Common sense tells me that (B) should be noticeably faster since it's only one dict lookup vs. 2, but the results are different. I'm scratching my head.

Results of a benchmark averaged over 12 runs, 1/2 of which are hits, the other half are misses:

doing in
switching to get
total time for IN:  0.532250006994
total time for GET:  0.480916659037
times found: 12000000
times not found: 12000000

And when a similar one is run (*10 more loops) without ever finding the key,

doing in
switching to get
total time for IN:  2.35899998744
total time for GET:  4.13858334223

Why!?

(correct) code

import time
smalldict = {}
for i in range(10):
    smalldict[str(i*4)] = str(i*18)

smalldict["8"] = "hello"

bigdict = {}
for i in range(10000):
    bigdict[str(i*100)] = str(i*4123)
bigdict["hello"] = "yes!"

timetotal = 0
totalin = 0
totalget = 0
key = "hello"
found= 0
notfound = 0

ddo = bigdict # change to smalldict for small dict gets
print 'doing in'


for r in range(12):
    start = time.time()
    a = r % 2
    for i in range(1000000):
        if a == 0:
            if str(key) in ddo:
                found = found + 1
                foo = ddo[str(key)]
            else:
                notfound = notfound + 1
                foo = "nooo"
        else:
            if 'yo' in ddo:
                found = found + 1
                foo = ddo['yo']
            else:
                notfound = notfound + 1
                foo = "nooo"
    timetotal = timetotal + (time.time() - start)

totalin = timetotal / 12.0 

print 'switching to get'
timetotal = 0
for r in range(12):
    start = time.time()
    a = r % 2
    for i in range(1000000):
        if a == 0:
            foo = ddo.get(key,None)
            if foo is not None:
                found = found + 1
            else:
                notfound = notfound + 1
                foo = "nooo"
        else:
            foo = ddo.get('yo',None)
            if foo is not None:
                found = found + 1
                notfound = notfound + 1
            else:
                notfound = notfound + 1
                foo = "oooo"
    timetotal = timetotal + (time.time() - start)

totalget = timetotal / 12

print "total time for IN: ", totalin
print 'total time for GET: ', totalget
print 'times found:', found
print 'times not found:', notfound

(original) code import time smalldict = {} for i in range(10): smalldict[str(i*4)] = str(i*18)

smalldict["8"] = "hello"

bigdict = {}
for i in range(10000):
    bigdict[str(i*100)] = str(i*4123)
bigdict["8000"] = "hello"

timetotal = 0
totalin = 0
totalget = 0
key = "hello"
found= 0
notfound = 0

ddo = bigdict # change to smalldict for small dict gets
print 'doing in'
for r in range(12):
    start = time.time()
    a = r % 2
    for i in range(10000000):
        if a == 0:
            if key in ddo:
                foo = ddo[key]
            else:
                foo = "nooo"
        else:
            if 'yo' in ddo:
                foo = ddo['yo']
            else:
                foo = "nooo"
    timetotal = timetotal + (time.time() - start)

totalin = timetotal / 12.0 

print 'switching to get'
timetotal = 0
for r in range(12):
    start = time.time()
    a = r % 2
    for i in range(10000000):
        if a == 0:
            foo = ddo.get(key,None)
            if foo is not None:
                # yaaay
                pass
            else:
                foo = "nooo"
        else:
            foo = ddo.get('yo',None)
            if foo is not None:
                #yaaay
                pass
            else:
                foo = "oooo"
    timetotal = timetotal + (time.time() - start)

totalget = timetotal / 12

print "total time for IN: ", totalin
print 'total time for GET: ', totalget
std''OrgnlDave
  • 3,912
  • 1
  • 25
  • 34
  • 1
    It doesn't look like you actually timed the case where the key is in the dict, so the `in` code never actually does a double lookup. – user2357112 Sep 19 '16 at 21:10
  • @user2357112: good observation. I suppose that member function lookup does the rest for `get`. I reduced time by 10% by pre-looking `get`: `g = ddo.get` then call `foo = g('yo',None)` – Jean-François Fabre Sep 19 '16 at 21:12
  • 1
    Looking up an attribute and calling a method that has to return an object *or* a default is never going to be as fast as a single opcode only checking for the presence of a key. Use the `timeit` module to do proper time trials though. – Martijn Pieters Sep 19 '16 at 21:13
  • Your test seems flawed in a different way. Why are you using `if` when `dict.get()` can handle both the true and false cases? Just use `foo = ddo.get(key, 'nooo')` and don't use an `if / else` test *at all*. This is what `dict.get()` designed for and where doing an `in` test plus an `if var = dict[key] / else var = 'default' ` branch will lose. – Martijn Pieters Sep 19 '16 at 21:15
  • 1
    Also note that you might do better switching from `range` to `xrange` on your inner loops ... You're likely spending lots of time just constructing lists with 10M elements ... – mgilson Sep 19 '16 at 21:20
  • @MartijnPieters because foo = 'nooo' is not the actual work done on failure. In fact it might include terminating a method that has about 100 of these with an exception. – std''OrgnlDave Sep 19 '16 at 21:26

1 Answers1

14

We can do some better timings:

import timeit

d = dict.fromkeys(range(10000))

def d_get_has(d):
    return d.get(1)

def d_get_not_has(d):
    return d.get(-1)

def d_in_has(d):
    if 1 in d:
        return d[1]

def d_in_not_has(d):
    if -1 in d:
        return d[-1]


print timeit.timeit('d_get_has(d)', 'from __main__ import d, d_get_has')
print timeit.timeit('d_get_not_has(d)', 'from __main__ import d, d_get_not_has')
print timeit.timeit('d_in_has(d)', 'from __main__ import d, d_in_has')
print timeit.timeit('d_in_not_has(d)', 'from __main__ import d, d_in_not_has')

On my computer, the "in" variants are faster than the .get variants. This is probably because .get is an attribute lookup on the dict and an attribute lookup is likely to be as expensive as a membership test on the dict. Note that in and item lookup using dict[x] can be done directly in bytecode so the normal method lookups can be bypassed...

It also might be worth pointing out that I get a HUGE optimization if I just use pypy :-):

$ python ~/sandbox/test.py
0.169840812683
0.1732609272
0.122044086456
0.0991759300232

$ pypy ~/sandbox/test.py
0.00974893569946
0.00752687454224
0.00812077522278
0.00686597824097
mgilson
  • 300,191
  • 65
  • 633
  • 696
  • If my production target was stable I would use pypy in a heartbeat. – std''OrgnlDave Sep 19 '16 at 21:24
  • I like the 'in' syntax more than 'get', so thanks for the better benchmarks! – std''OrgnlDave Sep 19 '16 at 21:27
  • This also drives home just how much overhead a method lookup can cause... – std''OrgnlDave Sep 19 '16 at 21:28
  • Interestingly, with jython (standalone version), it plays a big role, which functions are called first while python2 and python3 do not make much difference. The first two calls are like the last two lines (in), in the middle there is the (get). 0.56299996376 0.427000045776 0.524999856949 0.242000102997 0.301000118256 0.260999917984 – Ernie Mur Sep 02 '20 at 16:11
  • And if I access the dict.get directly instead using a get_... wrapper function it becomes 50 % faster in python(2.7) and in jython. As expected from this, the time required of a function only returning a variable is in the same order of magnitude. – Ernie Mur Sep 02 '20 at 17:55