2

How to create a list which contains the number of times an element appears in a number of lists. for example I have these lists:

list1 = ['apples','oranges','grape']
list2 = ['oranges, 'oranges', 'pear']
list3 = ['strawberries','bananas','apples']
list4 = [list1,list2,list3]

I want to count the number of documents that contain each element and put it in a dictionary, so for apples^and oranges I get this:

term['apples'] = 2
term['oranges'] = 2   #not 3
Micha Wiedenmann
  • 19,979
  • 21
  • 92
  • 137
user2578185
  • 417
  • 1
  • 6
  • 10

3 Answers3

0

Use collections.Counter

from collections import Counter
terms = Counter( x for lst in list4 for x in lst )
terms
=> Counter({'oranges': 3, 'apples': 2, 'grape': 1, 'bananas': 1, 'pear': 1, 'strawberries': 1})
terms['apples']
=> 2

As @Stuart pointed out, you can also use chain.from_iterable, to avoid the awkward-looking double-loop in the generator expression (i.e. the for lst in list4 for x in lst).

EDIT: another cool trick is to take the sum of the Counters (inspired by this famous answer), like:

sum(( Counter(lst) for lst in list4 ), Counter())

Community
  • 1
  • 1
shx2
  • 61,779
  • 13
  • 130
  • 153
  • 1
    I don't think list4 is ment to be included. – hetepeperfan Jul 13 '13 at 20:21
  • 1
    thank you, but problem is, I want to get the number of lists that this term appears in, for example if a list has it 5 times it should still count as one count...terms['apple'] gives the number of occurrence of this term in all documents, and not the number of documents that have apple – user2578185 Jul 13 '13 at 20:45
  • ah, in that case, use `set`s instead of lists, to remove duplicates. e.g. `list4 = [ set(list1), set(list2), set(list3) ]`. With that, the answer is still valid. – shx2 Jul 13 '13 at 21:25
0
print (list1 + list2 + list3).count('apples')

or if you have all the lists already compiled in list4, you could use itertools.chain as a quick way to link them:

from itertools import chain
print list(chain.from_iterable(list4)).count('apples')

EDIT: or you can do this without itertools:

print sum(list4, []).count('apples') 

and could easily replicate collections.Counter if for some reason you wanted to...

all_lists = sum(list4, [])
print dict((k, all_lists.count(k)) for k in set(all_lists))
Stuart
  • 9,597
  • 1
  • 21
  • 30
0
>>> [el for lst in [set(L) for L in list4] for el in lst].count('apples')
2
>>> [el for lst in [set(L) for L in list4] for el in lst].count('oranges')
2

If you want the final structure as a dictionary, a dict comprehension can be used to create a histogram from the flattened list of sets:

>>> list4sets = [set(L) for L in list4]
>>> list4flat = [el for lst in list4sets for el in lst]
>>> term = {el: list4flat.count(el) for el in list4flat}
>>> term['apples']
2
>>> term['oranges']
2
dansalmo
  • 11,506
  • 5
  • 58
  • 53