1

I have a list of lists like the following:

listoflist = [["A", "B", "A", "C", "D"], ["Z", "A", "B", "C"], ["D", "D", "X", "Y", "Z"]]

I want to find the number of sublists that each unique value in listoflist occurs in. For example, "A" shows up in two sublists, while "D" shows up in two sublists also, even though it occurs twice in listoflist[3].

How can I get a dataframe which has each unique element in one column and the frequency (number of sublists each unique element shows up in)?

Andy K
  • 4,944
  • 10
  • 53
  • 82
Keshav M
  • 1,309
  • 1
  • 13
  • 24
  • would this help you? https://stackoverflow.com/a/11829457/2572645 – Andy K May 20 '18 at 19:40
  • You said: "How can I get a dataframe...". Are you working with Pandas, and searching for a Pandas specific solution? If so, please mention it in the question and add a pandas tag to your question. – akaihola May 21 '18 at 17:33

3 Answers3

3

You can use: itertools.chain together with collections.Counter:

In [94]: import itertools as it

In [95]: from collections import Counter

In [96]: Counter(it.chain(*map(set, listoflist)))
Out[96]: Counter({'A': 2, 'B': 2, 'C': 2, 'D': 2, 'X': 1, 'Y': 1, 'Z': 2})

As mentioned in the comment by @Jean-François Fabre, you can also use:

In [97]: Counter(it.chain.from_iterable(map(set, listoflist)))
Out[97]: Counter({'A': 2, 'B': 2, 'C': 2, 'D': 2, 'X': 1, 'Y': 1, 'Z': 2})
llllllllll
  • 16,169
  • 4
  • 31
  • 54
2

Essentially, it seems that you want something like

Counter(x for xs in listoflist for x in set(xs))

Each list is converted into a set first, to exclude duplicates. Then the sequence of sets is flatmapped and fed into the Counter.

Full code:

from collections import Counter

listoflist = [["A", "B", "A", "C", "D"], ["Z", "A", "B", "C"], ["D", "D", "X", "Y", "Z"]]

c = Counter(x for xs in listoflist for x in set(xs))

print(c)

Results in:

# output:
# Counter({'B': 2, 'C': 2, 'Z': 2, 'D': 2, 'A': 2, 'Y': 1, 'X': 1})
Andrey Tyukin
  • 43,673
  • 4
  • 57
  • 93
1

Another way to do this is to use pandas:

import pandas as pd

df = pd.DataFrame(listoflist)
df.stack().reset_index().groupby(0)['level_0'].nunique().to_dict()

Output:

{'A': 2, 'B': 2, 'C': 2, 'D': 2, 'X': 1, 'Y': 1, 'Z': 2}
Scott Boston
  • 147,308
  • 15
  • 139
  • 187