Pandas dataframe: Count occurrence of list element in dataframe rows

Question

Using python, I would like to count the occurrence of a lists elements for each row in a dataframe, and aggregate each elements occurrence.

Here is the dataframe I am working with:

#Cluster_number_1   Cluster Type:   terpene
#Cluster_number_2   Cluster Type:   nrps
#Cluster_number_3   Cluster Type:   terpene
#Cluster_number_4   Cluster Type:   nrps
#Cluster_number_5   Cluster Type:   nrps
#Cluster_number_6   Cluster Type:   nrps
#Cluster_number_7   Cluster Type:   t1pks
#Cluster_number_8   Cluster Type:   other
#Cluster_number_9   Cluster Type:   t1pks
#Cluster_number_10  Cluster Type:   nrps

The corresponding list:

cluster_type = ["t1pks", "nrps", "terpene", "other"]

Desired output:

BGC_Class    Count
t1pks            2
nrps             5
terpene          2
other            1

To help explain, borrowing from unix $ variables:

file = "cluster_counts.txt"
cluster_count = open(file, "w")

cluster_count.write(+$1+"\t"+$2"\n")

Where $1 is the first element in the list, and $2 is the number of times it occurs, across all rows.

The dataframes won't exceed 100 lines, so efficiency is no issue.

Best, B.D.

I found something to get me started here How to count the occurrences of a list item?.

>>> l = ["a","b","b"]
>>> [[x,l.count(x)] for x in set(l)]
[['a', 1], ['b', 2]]

However this only counts the occurrences of elements within the list containing it.

I don't know how to count the occurrence of my lists elements in the dataframe.

Have you tried anything for this? Like `groupby` and `count()` or similar, which is a fundamental pattern in pandas. — roganjosh, Jun 07 '18 at 19:06

score 1 · Answer 1 · answered Jun 07 '18 at 19:04

1

Try

df.BGC_Class.value_counts()

If this does not work, please post your data :)

answered Jun 07 '18 at 19:04

The Unfun Cat

29,987
31
114
156

using 'df.apply(pd.value_counts)' seems to be the right track. how can i modify this to apply to the 3rd column? – Barry D Jun 07 '18 at 19:47
You do not need to apply. Above I assume that BGC_Class is the name of your column. See the names of the columns with `df.columns`. Replace with the one you find :) – The Unfun Cat Jun 07 '18 at 19:59
Fantastic. ¿Quién es un buen gato? – Barry D Jun 07 '18 at 20:12
Hello @TheUnfunCat can we also add a condition to this - Like where ColumnA = 'X' – aniltilanthe Feb 07 '20 at 13:30
`df[df.ColumnA == "X"].BGC_Class.value_counts()` – The Unfun Cat Feb 07 '20 at 13:33

score 1 · Answer 2 · answered Jun 07 '18 at 20:21

Creating the appropriate header over the corresponding column did the trick:

import pandas as pd

df = pd.read_csv('test2_output copy.tsv', sep='\t', names=['Cluster Number', '#', 'Cluster_Type'])
df.Cluster_Type.value_counts()

Output:

t1pks       7 
nrps        7
other       3
terpene     2
t1pks-nrps  1
indole      1

Thanks, 'The Unfun Cat'

Pandas dataframe: Count occurrence of list element in dataframe rows

2 Answers2