Python: How to iterate lists of dictionaries Fast

Question

I'm using a following code to iterate through the lists of dictionaries to locate a corresponding key ['5'] and when found to compare the values. While it works fine I believe it could be improved to get a fast performance. What other ways could be used to achieve the same result?

listA = [{1:'One', 2:'Two', 3:'Three'}, {4:'Four', 5:'Five', 6:'Six'}]
listB = [{4:'Four', 5:'Five', 6:'Six'}, {7:'Seven', 8:'Eight', 9:'Nine'}]

result=[]
for dictA in listA:
    if not 5 in dictA.keys(): continue
    for dictB in listB:
        if 5 in dictB.keys() and dictB[5]==dictA[5]:
            result.append(dictB[5])

I was writing an answer, but I can't figure out what on earth you are trying to do here. That code is messed up. — anon582847382, Mar 30 '14 at 17:04
Also this should probably be on codereview as it is about reviewing and optimising working code. — anon582847382, Mar 30 '14 at 17:11
You need to give us some idea of the sizes involved here. If you're really dealing with both lists having 2 dictionaries with 3 entries each, stop wasting time optimizing this. Otherwise, solutions for "check list of 1 million dicts against list of 1 million dicts" and "check list of 1 million dicts against list of 2 dicts" are likely to be very different. — user9876, Mar 30 '14 at 17:13
Also, I assume "if not 4 in dictA.keys()" should be "if not 5 in dictA.keys()" ? — user9876, Mar 30 '14 at 17:14
Yeah, it should be 5 instead of 4. Already fixed. Sorry for the mess — alphanumeric, Mar 30 '14 at 17:18
`not 5 in dictA.keys()` is better spelt `5 not in dictA.keys()` — Eric, Mar 30 '14 at 18:39
`X not in Y` is one operation, since `not in` is a single operator. `not X in Y` is 2 operations - `in` then `not`. The optimizer may or may not optimize them to the same, I haven't tested. It's also more "Pythonic" to use `not in`. — user9876, Mar 30 '14 at 19:23
Hopefully Python optimizes it. But still it's pretty cool to know this. Thanks! — alphanumeric, Mar 31 '14 at 05:20

score 3 · Answer 1 · answered Mar 30 '14 at 17:08

3

A quick check also shows that 4 in dictA is faster than 4 in dictA.keys().

answered Mar 30 '14 at 17:08

Jarod

61
2

I've learned that "if 5 in dictA.keys()" is more reliable than "if 5 in dictA". There were situations where the code was returning False even while a key was in a dict. I still don't know what caused it. .keys() method never fails. Any ideas why that would happen? – alphanumeric Mar 30 '14 at 17:20
If you're using custom classes that don't implement hash() properly. But those classes shouldn't be used as dict keys anyway – user9876 Mar 30 '14 at 17:26
In threaded code .keys() (in python2) returns a new list, it's more thread-safe. – x3al Mar 30 '14 at 17:32

StoryTeller - Unslander Monica · Answer 2 · 2014-07-05T13:34:46.083

2

You'd have to profile the code to see if there is an improvement, but it's generally a step in the right direction to use built-ins for the filtering, rather than scripting it yourself, since it will skip the interpreting of your filter boilerplate code.

for dictA in filter(lambda x : 4 in x, listA):
    for dictB in filter(lambda x : 5 in x, listB):
        if dictB[5]==dictA[5]:
            result.append(dictB[5])

Also, it makes it shorter and a bit more readable, which is with the Zen of Python. You shuold become acquainted with how a Python program looks, since you quite clearly appear to be trying to write C/Java like code in Python.

edited Jul 05 '14 at 13:34

answered Mar 30 '14 at 17:06

StoryTeller - Unslander Monica

165,132
21
377
458

`filer`? `filter`!, :p – CT Zhu Mar 30 '14 at 17:24
@CTZhu, what!? where!? :P – StoryTeller - Unslander Monica Mar 30 '14 at 18:37

score 2 · Accepted Answer · answered Mar 30 '14 at 17:22

First: You're not using most of listA; all you care about is the values from dictA[5]. So lets just extract the bits you care about, in a data structure that will allow fast access:

interesting_vals = frozenset([dictA[5] for dictA in listA if 5 in dictA])

Now we just need to check listB. Two approaches. The obvious one first:

result = [dictB[5] for dictB in listB
          if 5 in dictB and dictB[5] in interesting_vals]

or if you expect most dictBs to have a [5] element then this may be faster since it combines the access and the existance check (profile it with real data!):

NA = object()  # Will compare different to everything in interesting_vals
result = [dictB[5] for dictB in listB if dictB.get(5, NA) in interesting_vals]

This solution should be O(len(listA) + len(listB)), which is much better than your original O(len(listA) * len(listB)) if the lists are large.

Note that I am assuming that values of dictA[5] are hashable and have a hash that's consistent with equals - most built-in classes are, but some custom classes might not implement hash properly.

Inconsistent `__hash__` functions are a great way to have [fun](http://stackoverflow.com/q/15399024/383793). — Chris Wesseling, Mar 30 '14 at 17:53

score 1 · Answer 4 · answered Mar 30 '14 at 17:19

Oneliner:

%timeit filter(None, {item.get(5) for item in listA}.intersection(item.get(5) for item in listB))
100000 loops, best of 3: 8.59 us per loop

%%timeit
    ...: listA = [{1:'One', 2:'Two', 3:'Three'}, {4:'Four', 5:'Five', 6:'Six'}]
    ...: listB = [{4:'Four', 5:'Five', 6:'Six'}, {7:'Seven', 8:'Eight', 9:'Nine'}]
    ...: 
    ...: result=[]
    ...: for dictA in listA:
    ...:     if not 4 in dictA.keys(): continue
    ...:     for dictB in listB:
    ...:         if 5 in dictB.keys() and dictB[5]==dictA[5]:
    ...:             result.append(dictB[5])
    ...:             
100000 loops, best of 3: 11.9 us per loop

x3al · Answer 5 · 2014-03-30T17:26:57.193

0

result = [
    y[5]
    for x in listA
    if 5 in x
    for y in listB
    if 5 in y and x[5] == y[5]
]

edited Mar 30 '14 at 17:26

answered Mar 30 '14 at 17:11

x3al

586
1
8
24

3

I hate it when people use list comprehensions to compete with the unreadability of perl. – StoryTeller - Unslander Monica Mar 30 '14 at 17:15
Agree. It looks ugly. But it good to know it's possible to put an entire page on a single line... and again, some say the expressions are fast. – alphanumeric Mar 30 '14 at 17:22

score 0 · Answer 6 · edited Jun 27 '14 at 14:26

You should always test speed, but generator expressions (or list comprehension), should be faster than iterating:

result= []
A = (a for a in listA if 5 in a) 
for a in A:
    result.extend(b[5] for b in listB if (5 in b and a[5] == b[5]))

Or try to go functional:

import functools
fives = partial(filter, lambda x: 5 in x)
for a in fives(listA):
    result.extend(b[5] for fives(listB) if a[5] == b[5])

Python: How to iterate lists of dictionaries Fast

6 Answers6