-6

I want to do this in the list of list in python. For example I have a list = [['fruit','apple']['fruit','peach'],['fruit','banana'], ['animal', 'cat'],['animal', 'sheep'] ]

The print result should be ['apple', 'peach','banana'] and length = 3

How can I write an algorithm without the build-in function to do this?

  • 3
    Hi @Xuanming Shi, can you post what you have tried so far? – toti08 Aug 09 '18 at 07:06
  • "I want to find the most common words in the list of list in python" & "The print result should be ['apple', 'peach','banana']" - your question doesn't match the answer that you want. Finding the most common word through the entire list of list is 'fruit, animal,' then all the rest. – PeptideWitch Aug 09 '18 at 07:08
  • So do you want to find the most common word, or do you want to find the most common fruit? – PeptideWitch Aug 09 '18 at 07:09
  • i think fruit is most common, so you want all fruit? – Nihal Aug 09 '18 at 07:09
  • Yes, and I want to use an algorithm to do that rather than python build-in function – Xuanming Shi Aug 09 '18 at 07:50

4 Answers4

0

The result you want ['apple', 'peach','banana'] is not the common element. You can make a dictionary to separate different classes of objects.

dict = {}
for item_list in list_of_lists:
    if item_list[0] not in dict:
        dict[item_list[0]] = []
        dict[item_list[0]].append(item_list[1])
    else:
        dict[item_list[0]].append(item_list[1])

This will give you

dict = {'fruit': ['apple', 'peach', 'banana'], 'animal': ['cat', 'sheep']}
Harshita
  • 183
  • 1
  • 10
  • Dictionaries are a good choice, and while this doesn't strictly give the answer OP wants, this is a more useful way to work with the data – PeptideWitch Aug 09 '18 at 07:22
  • If you use defaultdict instead of dict, the test is not necessary (see my answer) – Gelineau Aug 09 '18 at 07:38
0

Use itertools.groupby to groupby key and then use max with key

Ex:

from itertools import groupby

l = [['fruit','apple'], ['fruit','peach'],['fruit','banana'], ['animal', 'cat'],['animal', 'sheep'] ]
s = dict((k, [i[1] for i in v]) for k, v in groupby(l, lambda x: x[0]))     #Group by key.
print( max(s.items(), key=lambda x: len(x[1]))[1] )    #Get max item using len 

Output:

['apple', 'peach', 'banana']
Rakesh
  • 81,458
  • 17
  • 76
  • 113
0

You can use defaultdict and max:

from collections import defaultdict

data = [['fruit','apple'],['fruit','peach'],['fruit','banana'], ['animal', 'cat'],['animal', 'sheep']]

grouped_data = defaultdict(list)

for key, value in data:
    grouped_data[key].append(value)

print(grouped_data)
# defaultdict(<class 'list'>, {'fruit': ['apple', 'peach', 'banana'], 'animal': ['cat', 'sheep']})

most_frequent = max(grouped_data.items(), key=lambda i: len(i[1]))
print(most_frequent)
# ('fruit', ['apple', 'peach', 'banana'])

print(len(most_frequent[1]))
# 3
Gelineau
  • 2,031
  • 4
  • 20
  • 30
0

How can I write an algorithm without the build-in function to do this?

you can't really do without the built in functions.I think you just don't want built in collections tools but you want to write an algorithm from scratch .I don't think it is the best way but you can use this:

def most_common(somelist: list) -> dict:
    somelist_dict = {}
    somelist_list = []
    result={'mostcommon':'','length':0,'content':[]}
    maxi=0
    word=''
    for x in somelist:
        try:
            somelist_dict[x[0]].append(x[1])
        except:
            somelist_dict[x[0]] = []
            somelist_list.append(x[0])
            somelist_dict[x[0]].append(x[1])

    for i, j in enumerate(somelist_list):
        if i == 0:
            word = j
            maxi = len(somelist_dict[j])
        else:
            tmp = len(somelist_dict[j])
            if maxi < tmp:
                word = j
                maxi = tmp
    result["mostcommon"]=word
    result['length']=maxi
    for k in somelist_dict[word]:
        result['content'].append(k)
    return result


somelist=[['fruit','apple'],['fruit','peach'],['fruit','banana'],['animal','cat'],['animal','sheep']]

print(most_common(somelist)) 

and the ouput is:

{'content': ['apple', 'peach', 'banana'], 'length': 3, 'mostcommon': 'fruit'}
Elementary
  • 1,443
  • 1
  • 7
  • 17