2

I have a number of lists of lists stored in a dictionary. I want to find the intersection of the sub-lists (i.e. intersection of dict[i][j]) for all keys of the dictionary. )

For example, if the dictionary stored sets of tuples instead, I could use the code:

set.intersection(*[index[key] for key in all_keys]) 

What is an efficient way to do this? One way I tried was to first convert each list of lists into a set of tuples and then taking the intersection of those, but this is rather clunky.

Example:

Suppose the dictionary of lists of lists is

dict = {} 
dict['A'] = [[1, 'charlie'], [2, 'frankie']] 
dict['B'] = [[1, 'charlie'], [2, 'chuck']]
dict['C'] = [[2, 'chuck'], [1, 'charlie']]

then I want to return

[1, 'charlie']

(maybe as a tuple, doesn't have to be list)

EDIT: I just found an okay way to do it, but it's not very 'pythonic'

def search(index, search_words): 
    rv = {tuple(t) for t in index[search_words[0]]}
    for word in search_words: 
        rv = rv.intersection({tuple(t) for t in index[word]})
    return rv 
thefourtheye
  • 233,700
  • 52
  • 457
  • 497
Craig
  • 835
  • 1
  • 7
  • 21

4 Answers4

2

Let's call your dictionary of lists of lists d:

>>> d = {'A': [[1, 'charlie'], [2, 'frankie']], 'B': [[1, 'charlie'], [2, 'chuck']], 'C': [[2, 'chuck'], [1, 'charlie']]}

I am calling it d because dict is a builtin and we would prefer not to overwrite it.

Now, find the intersection:

>>> set.intersection( *[ set(tuple(x) for x in d[k]) for k in d ] )
set([(1, 'charlie')])

How it works

  • set(tuple(x) for x in d[k])

    For key k, this forms a set out of tuples of the elements in d[k]. Taking k='A' for example:

    >>> k='A'; set(tuple(x) for x in d[k])
    set([(2, 'frankie'), (1, 'charlie')])
    
  • [ set(tuple(x) for x in d[k]) for k in d ]

    This makes a list of the sets from the step above. Thus:

    >>> [ set(tuple(x) for x in d[k]) for k in d ]
    [set([(2, 'frankie'), (1, 'charlie')]),
     set([(2, 'chuck'), (1, 'charlie')]),
     set([(2, 'chuck'), (1, 'charlie')])]
    
  • set.intersection( *[ set(tuple(x) for x in d[k]) for k in d ] )

    This gathers the intersection of the three sets as above:

    >>>set.intersection( *[ set(tuple(x) for x in d[k]) for k in d ] )
    set([(1, 'charlie')])
    
John1024
  • 109,961
  • 14
  • 137
  • 171
  • Brilliant, exactly what I needed. I tried doing something similar for awhile but my notation must have been off. Thanks! – Craig Jan 26 '15 at 06:37
1
[item for item in A if item in B+C]
Adam Hughes
  • 14,601
  • 12
  • 83
  • 122
  • There will be a varying number of list of lists, and they are stored in a dictionary. – Craig Jan 26 '15 at 06:07
  • Can each list only have the item once, or can there be redundancy in a single list? – Adam Hughes Jan 26 '15 at 06:08
  • That's a good question. I'm pretty sure that each list will only have the item once... but it's scraped data though, so I'm not sure I would put my trust into that assumption. – Craig Jan 26 '15 at 06:11
  • Cuz if that was the case, you could just add all three lists, and us collections.count to see which has three. – Adam Hughes Jan 26 '15 at 06:21
1

Your usecase demands the use of reduce function. But, quoting the BDFL's take on reduce,

So now reduce(). This is actually the one I've always hated most, because, apart from a few examples involving + or *, almost every time I see a reduce() call with a non-trivial function argument, I need to grab pen and paper to diagram what's actually being fed into that function before I understand what the reduce() is supposed to do. So in my mind, the applicability of reduce() is pretty much limited to associative operators, and in all other cases it's better to write out the accumulation loop explicitly.

So, what you have got is already 'pythonic'.

I would write the program like this

>>> d = {'A': [[1, 'charlie'], [2, 'frankie']],
... 'B': [[1, 'charlie'], [2, 'chuck']],
... 'C': [[2, 'chuck'], [1, 'charlie']]}
>>> values = (value for value in d.values())
>>> result = {tuple(item) for item in next(values)}
>>> for item in values:
...    result &= frozenset(tuple(items) for items in item)
>>> result
set([(1, 'charlie')])
thefourtheye
  • 233,700
  • 52
  • 457
  • 497
1
from collections import defaultdict    
d = {'A': [[1, 'charlie'], [2, 'frankie']], 'B': [[1, 'charlie'], [2, 'chuck']], 'C': [[2, 'chuck'], [1, 'charlie']]}

frequency = defaultdict(set)
for key, items in d.iteritems():
    for pair in items:
        frequency[tuple(pair)].add(key)
output = [
    pair for pair, occurrances in frequency.iteritems() if len(d) == len(occurrances)
]
print output
Aravinda
  • 126
  • 1
  • 5