Counting strings in single column

Question

I have a dataframe of transaction IDs where one column is made of different tags. Each row can have one or multiple tags. I want to count the instances of each tag. Using df.col.value_counts() won't work in this case because it won't count single occurrences.

Transaction	Tag
01	tag1
02	tag1, tag3
03	tag2
04	tag2, tag3

Using .value_counts() would result in:

tag1 1
tag1, tag3 1
tag2 1
tag2, tag3 1

What I am looking for is instead:

tag1 2
tag2 2
tag3 2

Any suggestions?

score 0 · Accepted Answer · answered May 09 '23 at 10:35

split and explode before value_counts :

df['Tag'].str.split(', *').explode().value_counts()

Output:

Tag
tag1    2
tag3    2
tag2    2
Name: count, dtype: int64

Or without pandas, using collections.Counter:

from collections import Counter

out = Counter(tag for s in df['Tag'] for tag in s.split(', '))

Output: Counter({'tag1': 2, 'tag3': 2, 'tag2': 2})

Counting strings in single column

1 Answers1