2

I'm using a following code to iterate through the lists of dictionaries to locate a corresponding key ['5'] and when found to compare the values. While it works fine I believe it could be improved to get a fast performance. What other ways could be used to achieve the same result?

listA = [{1:'One', 2:'Two', 3:'Three'}, {4:'Four', 5:'Five', 6:'Six'}]
listB = [{4:'Four', 5:'Five', 6:'Six'}, {7:'Seven', 8:'Eight', 9:'Nine'}]

result=[]
for dictA in listA:
    if not 5 in dictA.keys(): continue
    for dictB in listB:
        if 5 in dictB.keys() and dictB[5]==dictA[5]:
            result.append(dictB[5])
alphanumeric
  • 17,967
  • 64
  • 244
  • 392
  • 2
    I was writing an answer, but I can't figure out what on earth you are trying to do here. That code is messed up. – anon582847382 Mar 30 '14 at 17:04
  • 1
    Also this should probably be on codereview as it is about reviewing and optimising working code. – anon582847382 Mar 30 '14 at 17:11
  • 1
    You need to give us some idea of the sizes involved here. If you're really dealing with both lists having 2 dictionaries with 3 entries each, stop wasting time optimizing this. Otherwise, solutions for "check list of 1 million dicts against list of 1 million dicts" and "check list of 1 million dicts against list of 2 dicts" are likely to be very different. – user9876 Mar 30 '14 at 17:13
  • Also, I assume "if not 4 in dictA.keys()" should be "if not 5 in dictA.keys()" ? – user9876 Mar 30 '14 at 17:14
  • Yeah, it should be 5 instead of 4. Already fixed. Sorry for the mess – alphanumeric Mar 30 '14 at 17:18
  • `not 5 in dictA.keys()` is better spelt `5 not in dictA.keys()` – Eric Mar 30 '14 at 18:39
  • What would be a reason? – alphanumeric Mar 30 '14 at 18:42
  • `X not in Y` is one operation, since `not in` is a single operator. `not X in Y` is 2 operations - `in` then `not`. The optimizer may or may not optimize them to the same, I haven't tested. It's also more "Pythonic" to use `not in`. – user9876 Mar 30 '14 at 19:23
  • Hopefully Python optimizes it. But still it's pretty cool to know this. Thanks! – alphanumeric Mar 31 '14 at 05:20

6 Answers6

3

A quick check also shows that 4 in dictA is faster than 4 in dictA.keys().

Jarod
  • 61
  • 2
  • I've learned that "if 5 in dictA.keys()" is more reliable than "if 5 in dictA". There were situations where the code was returning False even while a key was in a dict. I still don't know what caused it. .keys() method never fails. Any ideas why that would happen? – alphanumeric Mar 30 '14 at 17:20
  • If you're using custom classes that don't implement hash() properly. But those classes shouldn't be used as dict keys anyway – user9876 Mar 30 '14 at 17:26
  • In threaded code .keys() (in python2) returns a new list, it's more thread-safe. – x3al Mar 30 '14 at 17:32
2

You'd have to profile the code to see if there is an improvement, but it's generally a step in the right direction to use built-ins for the filtering, rather than scripting it yourself, since it will skip the interpreting of your filter boilerplate code.

for dictA in filter(lambda x : 4 in x, listA):
    for dictB in filter(lambda x : 5 in x, listB):
        if dictB[5]==dictA[5]:
            result.append(dictB[5])

Also, it makes it shorter and a bit more readable, which is with the Zen of Python. You shuold become acquainted with how a Python program looks, since you quite clearly appear to be trying to write C/Java like code in Python.

StoryTeller - Unslander Monica
  • 165,132
  • 21
  • 377
  • 458
2

First: You're not using most of listA; all you care about is the values from dictA[5]. So lets just extract the bits you care about, in a data structure that will allow fast access:

interesting_vals = frozenset([dictA[5] for dictA in listA if 5 in dictA])

Now we just need to check listB. Two approaches. The obvious one first:

result = [dictB[5] for dictB in listB
          if 5 in dictB and dictB[5] in interesting_vals]

or if you expect most dictBs to have a [5] element then this may be faster since it combines the access and the existance check (profile it with real data!):

NA = object()  # Will compare different to everything in interesting_vals
result = [dictB[5] for dictB in listB if dictB.get(5, NA) in interesting_vals]

This solution should be O(len(listA) + len(listB)), which is much better than your original O(len(listA) * len(listB)) if the lists are large.

Note that I am assuming that values of dictA[5] are hashable and have a hash that's consistent with equals - most built-in classes are, but some custom classes might not implement hash properly.

user9876
  • 10,954
  • 6
  • 44
  • 66
1

Oneliner:

%timeit filter(None, {item.get(5) for item in listA}.intersection(item.get(5) for item in listB))
100000 loops, best of 3: 8.59 us per loop

%%timeit
    ...: listA = [{1:'One', 2:'Two', 3:'Three'}, {4:'Four', 5:'Five', 6:'Six'}]
    ...: listB = [{4:'Four', 5:'Five', 6:'Six'}, {7:'Seven', 8:'Eight', 9:'Nine'}]
    ...: 
    ...: result=[]
    ...: for dictA in listA:
    ...:     if not 4 in dictA.keys(): continue
    ...:     for dictB in listB:
    ...:         if 5 in dictB.keys() and dictB[5]==dictA[5]:
    ...:             result.append(dictB[5])
    ...:             
100000 loops, best of 3: 11.9 us per loop
CT Zhu
  • 52,648
  • 17
  • 120
  • 133
0
result = [
    y[5]
    for x in listA
    if 5 in x
    for y in listB
    if 5 in y and x[5] == y[5]
]
x3al
  • 586
  • 1
  • 8
  • 24
0

You should always test speed, but generator expressions (or list comprehension), should be faster than iterating:

result= []
A = (a for a in listA if 5 in a) 
for a in A:
    result.extend(b[5] for b in listB if (5 in b and a[5] == b[5]))

Or try to go functional:

import functools
fives = partial(filter, lambda x: 5 in x)
for a in fives(listA):
    result.extend(b[5] for fives(listB) if a[5] == b[5])
BenMorel
  • 34,448
  • 50
  • 182
  • 322
m.wasowski
  • 6,329
  • 1
  • 23
  • 30