20

I have a (potentially quite big) dictionary and a list of 'possible' keys. I want to quickly find which of the keys have matching values in the dictionary. I've found lots of discussion of single dictionary values here and here, but no discussion of speed or multiple entries.

I've come up with four ways, and for the three that work best I compare their speed on different sample sizes below - are there better methods? If people can suggest sensible contenders I'll subject them to the analysis below as well.

Sample lists and dictionaries are created as follows:

import cProfile
from random import randint

length = 100000

listOfRandomInts = [randint(0,length*length/10-1) for x in range(length)]
dictionaryOfRandomInts = {randint(0,length*length/10-1): "It's here" for x in range(length)}

 

Method 1: the 'in' keyword:

def way1(theList,theDict):
    resultsList = []
    for listItem in theList:
        if listItem in theDict:
            resultsList.append(theDict[listItem])
    return resultsList

cProfile.run('way1(listOfRandomInts,dictionaryOfRandomInts)')

32 function calls in 0.018 seconds

 

Method 2: error handling:

def way2(theList,theDict):
    resultsList = []
    for listItem in theList:
        try:
            resultsList.append(theDict[listItem])
        except:
            ;
    return resultsList

cProfile.run('way2(listOfRandomInts,dictionaryOfRandomInts)')

32 function calls in 0.087 seconds

 

Method 3: set intersection:

def way3(theList,theDict):
    return list(set(theList).intersection(set(theDict.keys())))

cProfile.run('way3(listOfRandomInts,dictionaryOfRandomInts)')

26 function calls in 0.046 seconds

 

Method 4: Naive use of dict.keys():

This is a cautionary tale - it was my first attempt and BY FAR the slowest!

def way4(theList,theDict):
    resultsList = []
    keys = theDict.keys()
    for listItem in theList:
        if listItem in keys:
            resultsList.append(theDict[listItem])
    return resultsList

cProfile.run('way4(listOfRandomInts,dictionaryOfRandomInts)')

12 function calls in 248.552 seconds

 

EDIT: Bringing the suggestions given in the answers into the same framework that I've used for consistency. Many have noted that more performance gains can be achieved in Python 3.x, particularly list comprehension-based methods. Many thanks for all of the help!

Method 5: Better way of performing intersection (thanks jonrsharpe):

def way5(theList, theDict):
    return = list(set(theList).intersection(theDict))

25 function calls in 0.037 seconds

 

Method 6: List comprehension (thanks jonrsharpe):

def way6(theList, theDict):
    return [item for item in theList if item in theDict]

24 function calls in 0.020 seconds

 

Method 7: Using the & keyword (thanks jonrsharpe):

def way7(theList, theDict):
    return list(theDict.viewkeys() & theList)

25 function calls in 0.026 seconds

For methods 1-3 and 5-7 I timed them as above with list/dictionaries of length 1000, 10000, 100000, 1000000, 10000000 and 100000000 and show a log-log plot of time taken. Across all lengths the intersection and in-statement method perform better. The gradients are all about 1 (maybe a bit higher), indicating O(n) or perhaps slightly super-linear scaling.

Log-Log plot comparing time-scaling of the 6 sensible methods with list/dict length

Community
  • 1
  • 1
StackG
  • 2,730
  • 5
  • 29
  • 45
  • 1
    Which version(s) of Python? In 2.x, where `dict.keys` returns a list, I'd expect `set(theDict.keys())` to be slower than `set(theDict)`, for example, and `set(theList).intersection(theDict)` to be faster than `set(theList).intersection(set(theDict.keys()))` (and [`timeit`](https://docs.python.org/2/library/timeit.html) agrees...) – jonrsharpe Apr 30 '15 at 10:35
  • 1
    Additionally, a [list comprehension](https://docs.python.org/2/tutorial/datastructures.html#list-comprehensions) is generally faster than looping and `append`ing. – jonrsharpe Apr 30 '15 at 10:42
  • If 2.x means 2.7, I'd use `dict.viewkeys` instead of `set(dict)` or `set(dict.keys)`; that gives you a set-like object backed by the existing dict's hash table, instead of copying the all the keys into another hash table. – abarnert Apr 30 '15 at 10:42
  • As a side note, why are you using `randint` and explicitly correcting the fenceposts instead of just using `randrange`? – abarnert Apr 30 '15 at 10:46
  • Meanwhile, these functions aren't actually equivalent—some return the keys that match, others return the values corresponding to the keys that match. Presumably the ones that do the latter are wasting at least a little time if that's not what you want. – abarnert Apr 30 '15 at 10:55
  • Also, when you're micro-optimizing: local lookups are much, much faster than method lookups and global lookups, so it's usually worth doing, e.g., `rlappend = resultList.append` and using the local name in the loop. That may or may not be true for `theDict.__contains__` and `theDict.__getitem__`, but you should test it both ways. – abarnert Apr 30 '15 at 10:57
  • 1
    For future reference, `[c]profile` isn't appropriate for doing benchmarks, especially micro-benchmarks. In fact, there's a Note right at the top of [the docs](https://docs.python.org/3/library/profile.html#introduction-to-the-profilers) that explains that, and recommends using `timeit` (as all of the answers do). In this particular case, on top of all the other things it doesn't do that `timeit` does, it's also adding overhead to every Python function call, but not to builtins, which skews some of the answers worse than others. – abarnert Apr 30 '15 at 23:13
  • 1
    What you use `profile` for is to determine which part of your code is making it slow, not whether it's slow. – abarnert Apr 30 '15 at 23:13
  • 1
    Also, why did you mention Python 3.x in the edit, but not PyPy or IronPython? If an 8% improvement is worth switching language versions for, surely a 700000000% improvement is worth switching implementations within the same language version, right? :) – abarnert Apr 30 '15 at 23:15

3 Answers3

5

Of a couple of additional methods I've tried, the fastest was a simple list comprehension:

def way6(theList, theDict):
    return [item for item in theList if item in theDict]

This runs the same process as your fastest approach, way1, but more quickly. For comparison, the quickest set-based way was

def way5(theList, theDict):
    return list(set(theList).intersection(theDict))

timeit results:

>>> import timeit
>>> setup = """from __main__ import way1, way5, way6
from random import randint
length = 100000
listOfRandomInts = [randint(0,length*length/10-1) for x in range(length)]
dictionaryOfRandomInts = {randint(0,length*length/10-1): "It's here" for x in range(length)}
"""
>>> timeit.timeit('way1(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
14.550477756582723
>>> timeit.timeit('way5(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
19.597916393388232
>>> timeit.timeit('way6(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
13.652289059326904

Having added @abarnert's suggestion:

def way7(theList, theDict):
    return list(theDict.viewkeys() & theList)

and re-run the timing I now get:

>>> timeit.timeit('way1(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
13.110055883138497
>>> timeit.timeit('way5(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
17.292466681101036
>>> timeit.timeit('way6(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
14.351759544463917
>>> timeit.timeit('way7(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
17.206370930653392

way1 and way6 have switched places, so I re-ran again:

>>> timeit.timeit('way1(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
13.648176054011941
>>> timeit.timeit('way6(listOfRandomInts,dictionaryOfRandomInts)', setup=setup, number=1000)
13.847062579316628

So it looks like the set approach is slower than the list, but the difference between the list and list comprehension is (surprisingly, to me at least) is a bit variable. I'd say just pick one, and not worry about it unless it becomes a real bottleneck later.

jonrsharpe
  • 115,751
  • 26
  • 228
  • 437
  • 1
    If this is 2.7, try `list(theDict.viewkeys() & theList)`. On my machine, it's about 10% faster than your `way5`. (For 3.x, you can do the same thing with `keys()`. And yes, unlike `set`, whose `&` operator only takes sets but there's an `intersection` method for any iterable, set-like views have an `&` operator that takes any iterable but no `intersection` method…) – abarnert Apr 30 '15 at 10:53
  • Your `way6` returns the matching keys; his `way1` returns the values corresponding to the matching keys. But then the same is true of his `way3`, so… I'm not sure what he actually wants. – abarnert Apr 30 '15 at 10:55
  • @abarnert I hadn't spotted that... once you have the keys, I suppose, it's easy enough to get the associated values! I've updated my answer with your suggestion. – jonrsharpe Apr 30 '15 at 11:01
  • That "10% faster" measurement was on 3.4 and `keys`. Repeating it on 2.7 and `viewkeys`, it's within 1% either way, just as you found. It looks like it's also no gain in 3.2. I'm not sure which of the recent dict changes would affect this (split hash tables, hash randomization, … none of it seems relevant…). – abarnert Apr 30 '15 at 11:12
5

First, I think you're on 2.7, so I'll do most of this with 2.7. But it's worth noting that if you're really interested in optimizing your code, the 3.x branch continues to get performance improvements, and the 2.x branch never will. And why are you using CPython instead of PyPy?


Anyway, some further micro-optimizations to try (in addition to the ones in jonrsharpe's answer:


Caching attribute and/or global lookups in local variables (it's called LOAD_FAST for a reason). For example:

def way1a(theList, theDict):
    resultsList = []
    rlappend = resultsList.append
    for listItem in theList:
        if listItem in theDict:
            rlappend(theDict[listItem])
    return resultsList

In [10]: %timeit way1(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 13.2 ms per loop
In [11]: %timeit way1a(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 12.4 ms per loop

But for some operator special methods, like __contains__ and __getitem__, that may not be worth doing. Of course you won't know until you try:

def way1b(theList, theDict):
    resultsList = []
    rlappend = resultsList.append
    tdin = theDict.__contains__
    tdgi = theDict.__getitem__
    for listItem in theList:
        if tdin(listItem):
            rlappend(tdgi(listItem))
    return resultsList

In [14]: %timeit way1b(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 12.8 ms per loop

Meanwhile, Jon's way6 answer already optimizes out the resultList.append entirely by using a listcomp, and we just saw that optimizing out the lookups he does have probably won't help. Especially in 3.x, where the comprehension is going to be compiled into a function of its own, but even in 2.7 I wouldn't expect any benefit, for the same reasons as in the explicit loop. But let's try just to be sure:

def way6(theList, theDict):
    return [theDict[item] for item in theList if item in theDict]
def way6a(theList, theDict):
    tdin = theDict.__contains__
    tdgi = theDict.__getitem__
    return [tdgi(item) for item in theList if tdin(item)]

In [31]: %timeit way6(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 14.7 ms per loop
In [32]: %timeit way6a(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 13.9 ms per loop

Surprisingly (at least to me), this time it actually helped. Not sure why.

But what I was really setting up for was this: another advantage of turning both the filter expression and the value expression into function calls is that we can use filter and map:

def way6b(theList, theDict):
    tdin = theDict.__contains__
    tdgi = theDict.__getitem__
    return map(tdgi, filter(tdin, theList))
def way6c(theList, theDict):
    tdin = theDict.__contains__
    tdgi = theDict.__getitem__
    return map(tdgi, ifilter(tdin, theList))

In [34]: %timeit way6b(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 10.7 ms per loop
In [35]: %timeit way6c(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 13 ms per loop

But that gain is largely 2.x-specific; 3.x has faster comprehensions, while its list(map(filter(…))) is slower than 2.x's map(filter(…)) or map(ifilter(…)).


You don't need to convert both sides of a set intersection to a set, just the left side; the right side can be any iterable, and a dict is already an iterable of its keys.

But, even better, a dict's key view (dict.keys in 3.x, dict.keyview in 2.7) is already a set-like object, and one backed by the dict's hash table, so you don't need to transform anything. (It doesn't have quite the same interface—it has no intersection method but its & operator takes iterables, unlike set, which has an intersection method that takes iterables but its & only takes sets. That's annoying, but we only care about performance here, right?)

def way3(theList,theDict):
    return list(set(theList).intersection(set(theDict.keys())))
def way3a(theList,theDict):
    return list(set(theList).intersection(theDict))
def way3b(theList,theDict):
    return list(theDict.viewkeys() & theList)

In [20]: %timeit way3(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 23.7 ms per loop
In [20]: %timeit way3a(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 15.5 ms per loop
In [20]: %timeit way3b(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 15.7 ms per loop

That last one didn't help (although using Python 3.4 instead of 2.7, it was 10% faster…), but the first one definitely did.

In real life, you may also want to compare the sizes of the two collections to decide which one gets setified, but here that information is static, so there's no point writing the code to test it.


Anyway, my fastest result was the map(filter(…)) on 2.7, by a pretty good margin. On 3.4 (which I didn't show here), Jon's listcomp was fastest (even fixed to return the values rather than the keys), and faster than any of the 2.7 methods. Also, 3.4's fastest set operation (using the key view as a set and the list as an iterable) were a lot closer to the iterative methods than in 2.7.

Community
  • 1
  • 1
abarnert
  • 354,177
  • 51
  • 601
  • 671
  • 1
    The list comprehensions runs in its own scope thing is true only for Python 3.x, in Python 2.7 it is still not the case. – Ashwini Chaudhary Apr 30 '15 at 20:39
  • 1
    Note that with `dict.viewkeys()` we are not transforming anything explicitly but internally it is equivalent to `res = set(dict); intersection_update(res, iterable)`, here `intersection_update` now calls `set_intersection` on `res` and `iterable`. Hence almost the same performance as `way3a`. – Ashwini Chaudhary Apr 30 '15 at 21:02
  • @AshwiniChaudhary: Are you sure `viewkeys` or its `__and__` method does something equivalent to `res = set(dict)`? That isn't true with 3.4's `keys` objects, so that would explain why 3.4 gets a significant speedup but 2.7 doesn't… – abarnert Apr 30 '15 at 23:07
  • @AshwiniChaudhary: Never mind, [I found it](https://hg.python.org/cpython/file/2.7/Objects/dictobject.c#l2981), and you're right. Good catch. – abarnert Apr 30 '15 at 23:07
  • @AshwiniChaudhary: Also, good catch on the 2.7 listcomp thing. I didn't write what I meant to say, so it was completely wrong… fixed. – abarnert Apr 30 '15 at 23:09
3
$ ipython2 # Apple CPython 2.7.6
[snip]
In [3]: %timeit way1(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 13.8 ms per loop

$ python27x -m ipython # custom-built 2.7.9
[snip]
In [3]: %timeit way1(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 13.7 ms per loop

$ ipython3 # python.org CPython 3.4.1
[snip]
In [3]: %timeit way1(listOfRandomInts, dictionaryOfRandomInts)
100 loops, best of 3: 12.8 ms per loop    

So, that's an 8% speedup just by using a later Python. (And the speedup was closer to 20% on the listcomp and dict-key-view versions.) And it's not because Apple's 2.7 is bad or anything, it's just that 3.x has continued to get optimizations over the past 5+ years, while 2.7 has not (and never will again).

And meanwhile:

$ ipython_pypy # PyPy 2.5.0 Python 2.7.8
[snip]
In [3]: %timeit way1(listOfRandomInts, dictionaryOfRandomInts)
1000000000 loops, best of 3: 1.97 ns per loop

That's a 7000000x speedup just by typing 5 extra characters. :)

I'm sure it's cheating here. Either the JIT implicitly memoized the result, or it's noticed that I didn't even look at the result and pushed that up the chain and realized it didn't need to do any of the steps, or something. But this actually happens in real life sometimes; I've had a huge mess of code that spent 3 days debugging and trying to optimize before realizing that everything it did was unnecessary…

At any rate, speedups on the order of 10x are pretty typical from PyPy even when it can't cheat. And it's a lot easier than tweaking attribute lookups or reversing the order of who gets turned into a set for 5%.

Jython is more unpredictable—sometimes almost as fast as PyPy, sometimes much slower than CPython. Unfortunately, timeit is broken in Jython 2.5.3, and I just broke my Jython 2.7 completely by upgrading from rc2 to rc3, so… no tests today. Similarly, IronPython is basically Jython redone on a different VM; it's usually faster, but again unpredictable. But my current version of Mono and my current version of IronPython aren't playing nice together, so no tests there either.

abarnert
  • 354,177
  • 51
  • 601
  • 671