I have a (potentially quite big) dictionary and a list of 'possible' keys. I want to quickly find which of the keys have matching values in the dictionary. I've found lots of discussion of single dictionary values here and here, but no discussion of speed or multiple entries.
I've come up with four ways, and for the three that work best I compare their speed on different sample sizes below - are there better methods? If people can suggest sensible contenders I'll subject them to the analysis below as well.
Sample lists and dictionaries are created as follows:
import cProfile
from random import randint
length = 100000
listOfRandomInts = [randint(0,length*length/10-1) for x in range(length)]
dictionaryOfRandomInts = {randint(0,length*length/10-1): "It's here" for x in range(length)}
Method 1: the 'in'
keyword:
def way1(theList,theDict):
resultsList = []
for listItem in theList:
if listItem in theDict:
resultsList.append(theDict[listItem])
return resultsList
cProfile.run('way1(listOfRandomInts,dictionaryOfRandomInts)')
32 function calls in 0.018 seconds
Method 2: error handling:
def way2(theList,theDict):
resultsList = []
for listItem in theList:
try:
resultsList.append(theDict[listItem])
except:
;
return resultsList
cProfile.run('way2(listOfRandomInts,dictionaryOfRandomInts)')
32 function calls in 0.087 seconds
Method 3: set intersection:
def way3(theList,theDict):
return list(set(theList).intersection(set(theDict.keys())))
cProfile.run('way3(listOfRandomInts,dictionaryOfRandomInts)')
26 function calls in 0.046 seconds
Method 4: Naive use of dict.keys()
:
This is a cautionary tale - it was my first attempt and BY FAR the slowest!
def way4(theList,theDict):
resultsList = []
keys = theDict.keys()
for listItem in theList:
if listItem in keys:
resultsList.append(theDict[listItem])
return resultsList
cProfile.run('way4(listOfRandomInts,dictionaryOfRandomInts)')
12 function calls in 248.552 seconds
EDIT: Bringing the suggestions given in the answers into the same framework that I've used for consistency. Many have noted that more performance gains can be achieved in Python 3.x, particularly list comprehension-based methods. Many thanks for all of the help!
Method 5: Better way of performing intersection (thanks jonrsharpe):
def way5(theList, theDict):
return = list(set(theList).intersection(theDict))
25 function calls in 0.037 seconds
Method 6: List comprehension (thanks jonrsharpe):
def way6(theList, theDict):
return [item for item in theList if item in theDict]
24 function calls in 0.020 seconds
Method 7: Using the &
keyword (thanks jonrsharpe):
def way7(theList, theDict):
return list(theDict.viewkeys() & theList)
25 function calls in 0.026 seconds
For methods 1-3 and 5-7 I timed them as above with list/dictionaries of length 1000, 10000, 100000, 1000000, 10000000 and 100000000 and show a log-log plot of time taken. Across all lengths the intersection and in-statement method perform better. The gradients are all about 1 (maybe a bit higher), indicating O(n) or perhaps slightly super-linear scaling.