Efficient way of comparing multiple lists in python

Question

I have 5 long lists with word pairs as given in the example below. Note that this could include word pair lists like [['Salad', 'Fat']] AND word pair list of lists like [['Bread', 'Oil'], ['Bread', ' Salt']]

list_1 = [ [['Salad', 'Fat']], [['Bread', 'Oil'], ['Bread', 'Salt']], [['Salt', 'Sugar'] ]
list_2 = [ [['Salad', 'Fat'], ['Salt', 'Sugar']], [['Protein', 'Soup']] ]
list_3 = [ [['Salad', ' Protein']], [['Bread', ' Oil']], [['Sugar', 'Salt'] ]
list_4 = [ [['Salad', ' Fat'], ['Salad', 'Chicken']] ]
list_5 = [ ['Sugar', 'Protein'], ['Sugar', 'Bread'] ]

Now I want to calculate the frequency of word pairs.

For example, in the above 5 lists, I should get the output as follows, where the word pairs and its frequency is shown.

output_list = [{'['Salad', 'Fat']': 3}, {['Bread', 'Oil']: 2}, {['Salt', 'Sugar']: 2, 
{['Sugar','Salt']: 1} and so on]

What is the most efficient way of doing it in python?

Is there different nesting levels of pairs within the outer list? — jlandercy, Sep 12 '17 at 14:33
The most efficient way of doing it is probably to do a better job of building the original lists, so they don't have a mish-mash of various levels of nesting lists. Can you show us how they were built? — aghast, Sep 12 '17 at 14:34
Any reason `list_5` is only 2 levels deep, but every other list is 3 levels? — AChampion, Sep 12 '17 at 14:37
@AChampion The length of the lists vary according to my problem :) — , Sep 12 '17 at 23:47

marcusshep · Answer 1 · 2017-09-12T14:55:12.877

1

You could flatten all the lists. Then use Counter to count the word frequencies.

>>> import itertools
>>> from collections import Counter
>>> l = [[1,2,3],[3,4,1,5]]
>>> counts = Counter(list(itertools.chain(*l)))
>>> counts
Counter({1: 2, 3: 2, 2: 1, 4: 1, 5: 1})

NOTE: this flattening technique will work only with lists of lists. For other flattening techniques see the link provided above.

EDIT: Thanks to AChampion counts = Counter(list(itertools.chain(*l))) can be written as counts = Counter(list(itertools.chain.from_iterable(l)))

edited Sep 12 '17 at 14:55

answered Sep 12 '17 at 14:34

marcusshep

1,916
2
18
31

`itertools.chain.from_iterable(l)` would be better than arg unpacking. – AChampion Sep 12 '17 at 14:48
Do you say this because it would be more explicit? @AChampion – marcusshep Sep 12 '17 at 14:52
It's also quicker, and built for the job. – AChampion Sep 12 '17 at 14:53

AChampion · Accepted Answer · 2017-09-13T00:52:43.547

Given you have uneven nested lists this makes the code ugly, so would look to fix the input lists.

collections.Counter() is built for this kind of thing but lists are not hashable so you need to turn them into tuples (as well as strip off the spurious spaces):

In []:
import itertools as it
from collections import Counter

list_1 = [ [['Salad', 'Fat']], [['Bread', 'Oil'], ['Bread', 'Salt']], [['Salt', 'Sugar'] ]]
list_2 = [ [['Salad', 'Fat'], ['Salt', 'Sugar']], [['Protein', 'Soup']] ]
list_3 = [ [['Salad', ' Protein']], [['Bread', ' Oil']], [['Sugar', 'Salt'] ]]
list_4 = [ [['Salad', ' Fat'], ['Salad', 'Chicken']] ]
list_5 = [ ['Sugar', 'Protein'], ['Sugar', 'Bread']] 

t = lambda x: tuple(map(str.strip, x))
c = Counter(map(t, it.chain.from_iterable(it.chain(list_1, list_2, list_3, list_4))))
c += Counter(map(t, list_5))
c

Out[]:
Counter({('Bread', 'Oil'): 2,
         ('Bread', 'Salt'): 1,
         ('Protein', 'Soup'): 1,
         ('Salad', 'Chicken'): 1,
         ('Salad', 'Fat'): 3,
         ('Salad', 'Protein'): 1,
         ('Salt', 'Sugar'): 2,
         ('Sugar', 'Bread'): 1,
         ('Sugar', 'Protein'): 1,
         ('Sugar', 'Salt'): 1})

Thank you for the answer. However what is `it` in your code? Do we have to import it? — , Sep 12 '17 at 23:54
Sorry, yes it is, thanks @Volka. I use it so often it is almost subconscious, `itertools as it`, `functools as ft`, `operator as op` and the other well established `numpy as np`, `pandas as pd`. — AChampion, Sep 13 '17 at 00:46

Efficient way of comparing multiple lists in python

2 Answers2