-1

I have 5 long lists with word pairs as given in the example below. Note that this could include word pair lists like [['Salad', 'Fat']] AND word pair list of lists like [['Bread', 'Oil'], ['Bread', ' Salt']]

list_1 = [ [['Salad', 'Fat']], [['Bread', 'Oil'], ['Bread', 'Salt']], [['Salt', 'Sugar'] ]
list_2 = [ [['Salad', 'Fat'], ['Salt', 'Sugar']], [['Protein', 'Soup']] ]
list_3 = [ [['Salad', ' Protein']], [['Bread', ' Oil']], [['Sugar', 'Salt'] ]
list_4 = [ [['Salad', ' Fat'], ['Salad', 'Chicken']] ]
list_5 = [ ['Sugar', 'Protein'], ['Sugar', 'Bread'] ]

Now I want to calculate the frequency of word pairs.

For example, in the above 5 lists, I should get the output as follows, where the word pairs and its frequency is shown.

output_list = [{'['Salad', 'Fat']': 3}, {['Bread', 'Oil']: 2}, {['Salt', 'Sugar']: 2, 
{['Sugar','Salt']: 1} and so on]

What is the most efficient way of doing it in python?

2 Answers2

1

You could flatten all the lists. Then use Counter to count the word frequencies.

>>> import itertools
>>> from collections import Counter
>>> l = [[1,2,3],[3,4,1,5]]
>>> counts = Counter(list(itertools.chain(*l)))
>>> counts
Counter({1: 2, 3: 2, 2: 1, 4: 1, 5: 1})

NOTE: this flattening technique will work only with lists of lists. For other flattening techniques see the link provided above.

EDIT: Thanks to AChampion counts = Counter(list(itertools.chain(*l))) can be written as counts = Counter(list(itertools.chain.from_iterable(l)))

marcusshep
  • 1,916
  • 2
  • 18
  • 31
1

Given you have uneven nested lists this makes the code ugly, so would look to fix the input lists.

collections.Counter() is built for this kind of thing but lists are not hashable so you need to turn them into tuples (as well as strip off the spurious spaces):

In []:
import itertools as it
from collections import Counter

list_1 = [ [['Salad', 'Fat']], [['Bread', 'Oil'], ['Bread', 'Salt']], [['Salt', 'Sugar'] ]]
list_2 = [ [['Salad', 'Fat'], ['Salt', 'Sugar']], [['Protein', 'Soup']] ]
list_3 = [ [['Salad', ' Protein']], [['Bread', ' Oil']], [['Sugar', 'Salt'] ]]
list_4 = [ [['Salad', ' Fat'], ['Salad', 'Chicken']] ]
list_5 = [ ['Sugar', 'Protein'], ['Sugar', 'Bread']] 

t = lambda x: tuple(map(str.strip, x))
c = Counter(map(t, it.chain.from_iterable(it.chain(list_1, list_2, list_3, list_4))))
c += Counter(map(t, list_5))
c

Out[]:
Counter({('Bread', 'Oil'): 2,
         ('Bread', 'Salt'): 1,
         ('Protein', 'Soup'): 1,
         ('Salad', 'Chicken'): 1,
         ('Salad', 'Fat'): 3,
         ('Salad', 'Protein'): 1,
         ('Salt', 'Sugar'): 2,
         ('Sugar', 'Bread'): 1,
         ('Sugar', 'Protein'): 1,
         ('Sugar', 'Salt'): 1})
AChampion
  • 29,683
  • 4
  • 59
  • 75
  • Thank you for the answer. However what is `it` in your code? Do we have to import it? –  Sep 12 '17 at 23:54
  • 1
    I think it is `import itertools as it` –  Sep 13 '17 at 00:09
  • 2
    Sorry, yes it is, thanks @Volka. I use it so often it is almost subconscious, `itertools as it`, `functools as ft`, `operator as op` and the other well established `numpy as np`, `pandas as pd`. – AChampion Sep 13 '17 at 00:46