What's the most efficient way to perform a multiple match lookup in a python dictionary?

Question

I'm looking to maximally optimize the runtime for this chunk of code:

aDictionary= {"key":["value", "value2", ...

rests = \
         list(map((lambda key: Resp(key=key)),
                     [key for key, values in
                      aDictionary.items() if (test1 in values or test2 in values)]))

using python3. willing to throw as much memory at it as possible.

considering throwing the two dictionary lookups on separate processes for speedup (does that make sense?). any other optimization ideas welcome

values can definitely be sorted and turned into a set; it is precomputed, very very large.
always len(values) >>>> len(tests), though they're both growing over time
len(tests) grows very very slowly, and has new values for each execution
currently looking at strings (considering doing a string->integer mapping)

Will there be more than the two `testx` eventually? That would have some influence on whether to make sets out of lists or not. Also, do the `testx` values occur very often in `values` or very rarely? Are `values` gonna be sorted? — user2390182, Nov 16 '16 at 22:19
Are `values` gonna be sorted? Are there, on average, more `testx` variables than `values` per ker key? — user2390182, Nov 16 '16 at 22:33
How many other methods have you eliminated as not satisfactory - so we can avoid duplication of effort. — wwii, Nov 16 '16 at 22:56
Magnitude of ```values```? How much time is your current solution taking? — wwii, Nov 16 '16 at 23:44

score 2 · Answer 1 · answered Nov 16 '16 at 22:07

For starters, there is no reason to use map when you are already using a list comprehension, so you can remove that, as well as the outer list call:

rests = [Resp(key=key) for key, values in aDictionary.items()
         if (test1 in values or test2 in values)]

A second possible optimization might be to turn each list of values into a set. That would take up time initially, but it would change your lookups (in uses) from linear time to constant time. You might need to create a separate helper function for that. Something like:

def anyIn(checking, checkingAgainst):
    checkingAgainst = set(checkingAgainst)
    for val in checking:
        if val in checkingAgainst:
            return True
    return False

Then you could change the end of your list comprehension to read

...if anyIn([test1, test2], values)]

But again, this would probably only be worth it if you had more than two values you were checking, or if the list of values in values is very long.

score 2 · Answer 2 · edited May 23 '17 at 10:32

2

If tests are sufficiently numerous, it will surely pay off to switch to set operations:

tests = set([test1, test2, ...])
resps = map(Resp, (k for k, values in dic.items() if not tests.isdisjoint(values)))  
# resps this is a lazy iterable, not a list, and it uses a 
# generator inside, thus saving the overhead of building 
# the inner list.

Turning the dict values into sets would not gain anything as the conversion would be O(N) with N being the added size of all values-lists, while the above disjoint operation will only iterate each values until it encounters a testx with O(1) lookup.

map is possibly more performant compared to a comprehension if you do not have to use lambda, e.g. if key can be used as the first positional argument in Resp's __init__, but certainly not with the lambda! (Python List Comprehension Vs. Map). Otherwise, a generator or comprehension will be better:

resps = (Resp(key=k) for k, values in dic.items() if not tests.isdisjoint(values))
#resps = [Resp(key=k) for k, values in dic.items() if not tests.isdisjoint(values)]

edited May 23 '17 at 10:32

Community

1
1

answered Nov 16 '16 at 22:56

user2390182

72,016
6
67
89

How about ```if not tests.isdisjoint(set(values))``` for the filter - or just ```if not tests.isdisjoint(values)```?? – wwii Nov 16 '16 at 23:01
I can definitely transform the values into a set in advance. the key to set conversion might not be worth it since it's likely at runtime – blueberryfields Nov 16 '16 at 23:05
@wwii That would improve performance, but only if you can make `values` sets from the get-go. Otherwise, the conversion `list`-`set` would outweigh the gain. – user2390182 Nov 16 '16 at 23:05
*the test to set conversion might not be worth it since it's at runtime and the len(tests) are relatively low – blueberryfields Nov 16 '16 at 23:07
But you only have one set of tests, no. So, unless you have less then 3 tests, it is definitely worth it, as you will reuse this set for every value for every key in your dict, and the set's contains-check is much faster than the list's. – user2390182 Nov 16 '16 at 23:11
I just did some timing and ```...isdisjoint(values)``` with values as a list is faster than the comprehension and scales better than converting ```values``` to a set beforehand. – wwii Nov 16 '16 at 23:15
That makes sense. As you need to iterate one of the two,disjoint for a set and a list should be as performant as for two sets. I will update... – user2390182 Nov 16 '16 at 23:21
maybe i've misnamed - length of tests doesn't change, but the contents are new each time. length of values changes more frequently; the contents are precomputed and relatively static – blueberryfields Nov 17 '16 at 01:05

What's the most efficient way to perform a multiple match lookup in a python dictionary?

2 Answers2