3

Im writing a function to handle multiple queries in a boolean AND search. I have a dict of docs where each query occurs= query_dict

I want the intersection of all values in the query_dict.values():

query_dict = {'foo': ['doc_one.txt', 'doc_two.txt', 'doc_three.txt'],
              'bar': ['doc_one.txt', 'doc_two.txt'],
              'foobar': ['doc_two.txt']}

intersect(query_dict)

>> doc_two.txt

I've been reading about intersection but I'm finding it hard to apply it to a dict.

Thanks for your help!

2 Answers2

14
In [36]: query_dict = {'foo': ['doc_one.txt', 'doc_two.txt', 'doc_three.txt'],
              'bar': ['doc_one.txt', 'doc_two.txt'],
              'foobar': ['doc_two.txt']}

In [37]: reduce(set.intersection, (set(val) for val in query_dict.values()))
Out[37]: set(['doc_two.txt'])

In [41]: query_dict = {'foo': ['doc_one.txt', 'doc_two.txt', 'doc_three.txt'], 'bar': ['doc_one.txt', 'doc_two.txt'], 'foobar': ['doc_two.txt']}

set.intersection(*(set(val) for val in query_dict.values())) is also a valid solution, though it's a bit slower:

In [42]: %timeit reduce(set.intersection, (set(val) for val in query_dict.values()))
100000 loops, best of 3: 2.78 us per loop

In [43]: %timeit set.intersection(*(set(val) for val in query_dict.values()))
100000 loops, best of 3: 3.28 us per loop
inspectorG4dget
  • 110,290
  • 27
  • 149
  • 241
  • 2
    Instead of `reduce`, `set.intersection(*(set(val).. etc))` should work too. – DSM Dec 10 '12 at 22:15
  • 1
    @DSM: that would work too, but mine's faster. See my updated answer – inspectorG4dget Dec 10 '12 at 22:34
  • 2
    not if you compare the time it takes to type the extra characters, depending on how many times you perform the op.. – DSM Dec 10 '12 at 22:38
  • 1
    `reduce` is no longer available from the standard library, use `functools.reduce` instead, https://stackoverflow.com/questions/8689184/nameerror-name-reduce-is-not-defined-in-python – Jia Gao Sep 13 '19 at 19:41
0

Another way

first = query_dict.values()[0]
rest = query_dict.values()[1:]
print [t for t in set(first) if all(t in q for q in rest)]
Stuart
  • 9,597
  • 1
  • 21
  • 30