0

I have a dataframe column that contains python lists of tags. I need to create a dictionary that counts how many times a tag was used. I did it this way:

tags_use_count = {}

def count_tags(tag_list):
    
    for tag in tag_list:
        if tag in tags_use_count:
            tags_use_count[tag] += 1
        else:
            tags_use_count[tag] = 1

q2019['Tags'].apply(count_tags)

It works just fine, but I wonder if this is a good way of doing it. Somehow, using apply that way seems like a crappy workaround that seasoned coders would frown upon. (It's not what apply was built for, I guess.) The dataset is small, so I guess I could use iterrows to loop through the column, but I understand it's not a good idea for larger datasets and I wonder if my approach would be the go-to in that case or if there's a a better way.

Nicola
  • 379
  • 3
  • 14

2 Answers2

0

IIUC, you just want to count across every list in every row. So you can just explode 'Tags'-column and count values and convert to dictionary:

q2019['Tags'].explode().value_counts().to_dict()
0

You can use collections.Counter to do exactly this:

>>> from collections import Counter
>>> tag_list = ['tag_a', 'tag_b', 'tag_b', 'tag_c']
>>> dict(Counter(tag_list))
{ 'tag_a': 1, 'tag_b': 2, 'tag_c': 1}

Mustafa Shujaie
  • 1,447
  • 1
  • 16
  • 30