1

I have a dataframe of transaction IDs where one column is made of different tags. Each row can have one or multiple tags. I want to count the instances of each tag. Using df.col.value_counts() won't work in this case because it won't count single occurrences.

Transaction Tag
01 tag1
02 tag1, tag3
03 tag2
04 tag2, tag3

Using .value_counts() would result in:

  • tag1 1
  • tag1, tag3 1
  • tag2 1
  • tag2, tag3 1

What I am looking for is instead:

  • tag1 2
  • tag2 2
  • tag3 2

Any suggestions?

1 Answers1

0

split and explode before value_counts :

df['Tag'].str.split(', *').explode().value_counts()

Output:

Tag
tag1    2
tag3    2
tag2    2
Name: count, dtype: int64

Or without pandas, using collections.Counter:

from collections import Counter

out = Counter(tag for s in df['Tag'] for tag in s.split(', '))

Output: Counter({'tag1': 2, 'tag3': 2, 'tag2': 2})

mozway
  • 194,879
  • 13
  • 39
  • 75