2

How to find most occurring combinations in a list of lists. Combinations length can be any.

So, sample data:

l = [['action','mystery','horror','thriller'],
 ['drama','romance'],
 ['comedy','drama','romance'],
 ['scifi','mystery','horror','thriller'],
 ['horror','mystery','thriller']]

Expected output:

'mystery','horror','thriller' - 3 times
'drama','romance' - 2 times

With the help of this post, I was able to find out most occurring pairs(combination of 2), but how to extend it find combinations of any length.

EDIT: As per @CrazyChucky's comment:

Sample input:

l = [['action','mystery','horror','thriller'],
     ['drama','romance'],
     ['comedy','drama','romance'],
     ['scifi','mystery','horror','thriller'],
     ['horror','mystery','thriller'],
     ['mystery','horror']]

Expected output:

'mystery','horror' - 4 times
'mystery','horror','thriller' - 3 times
'drama','romance' - 2 times
Mayank Porwal
  • 33,470
  • 8
  • 37
  • 58

2 Answers2

2

You can adapt the code from that question to iterate over all the possible combinations of each possible size from each sublist:

from collections import Counter
from itertools import combinations

l = [['action','mystery','horror','thriller'],
 ['drama','romance'],
 ['comedy','drama','romance'],
 ['scifi','mystery','horror','thriller'],
 ['horror','mystery','thriller']]
d  = Counter()
for sub in l:
    if len(sub) < 2:
        continue
    sub.sort()
    for sz in range(2, len(sub)+1):
        for comb in combinations(sub, sz):
            d[comb] += 1

print(d.most_common())

Output:

[
 (('horror', 'mystery'), 3),
 (('horror', 'thriller'), 3),
 (('mystery', 'thriller'), 3),
 (('horror', 'mystery', 'thriller'), 3),
 (('drama', 'romance'), 2),
 (('action', 'horror'), 1),
 (('action', 'mystery'), 1),
 (('action', 'thriller'), 1),
 (('action', 'horror', 'mystery'), 1),
 (('action', 'horror', 'thriller'), 1),
 (('action', 'mystery', 'thriller'), 1),
 (('action', 'horror', 'mystery', 'thriller'), 1),
 (('comedy', 'drama'), 1),
 (('comedy', 'romance'), 1),
 (('comedy', 'drama', 'romance'), 1),
 (('horror', 'scifi'), 1),
 (('mystery', 'scifi'), 1),
 (('scifi', 'thriller'), 1),
 (('horror', 'mystery', 'scifi'), 1),
 (('horror', 'scifi', 'thriller'), 1),
 (('mystery', 'scifi', 'thriller'), 1),
 (('horror', 'mystery', 'scifi', 'thriller'), 1)
]

To get just the genres which have the highest count you can iterate over the counter:

most_frequent = [g for g, cnt in d.items() if cnt == d.most_common(1)[0][1]]
Nick
  • 138,499
  • 22
  • 57
  • 95
  • Thanks for the answer.Your answer gives me all the combinations. In this case, I'll have to loop it one more time to find out the combinations with max number, right? – Mayank Porwal Dec 22 '20 at 05:55
  • Neat solution. Have you seen CrazyChucky's comment above? (and OP's reply) – Pynchia Dec 22 '20 at 06:02
  • @MayankPorwal you can get the genres with the maximum occurrence using `most_frequent = [g for g, cnt in d.items() if cnt == d.most_common(1)[0][1]]` – Nick Dec 22 '20 at 06:49
  • 1
    @Pynchia been away, just saw the comments, I think my comment above resolves that for OP – Nick Dec 22 '20 at 06:50
0

I wrote a simple code without importing any packages

lst = [['action','mystery','horror','thriller'],
 ['drama','romance'],
 ['comedy','drama','romance'],
 ['scifi','mystery','horror','thriller'],
 ['horror','mystery','thriller']]


def print_it_all_by_num(arr: list):
    dic = dict()
    for i in arr:
        for j in i:
            if j in dic:
                dic[j] += 1
            else:
                dic[j] = 1
    dic_out = dict()
    for i in dic:
        if dic[i] in dic_out:
            dic_out[dic[i]].append(i)
        else:
            dic_out[dic[i]] = [i]
    print(dic_out)  # out is {1: ['action', 'comedy', 'scifi'], 3: ['mystery', 'horror', 'thriller'], 2: ['drama', 'romance']}


print_it_all_by_num(lst)  
Dharman
  • 30,962
  • 25
  • 85
  • 135
crackanddie
  • 688
  • 1
  • 6
  • 20
  • Why use indexes in python? What is the value of not using modules from the standard library? Why reinvent the wheel? – Pynchia Dec 22 '20 at 06:00
  • This code not cover the issue correctly... though it work for the specific case, it counts items that not necessary in the same sublist... – adir abargil Dec 22 '20 at 06:03