Consider the following DataFrame:
link tags views
/a [tag_a, tag_b] 100
/b [tag_a, tag_c] 200
/c [tag_b, tag_c] 150
What would be an efficient way to 'groupby' items within a list in the tags column. For instance, if one were to find the cumulative views for each tag in the DataFrame above, the result would be:
tag views
tag_a 300
tag_b 250
tag_c 350
So far, this is what I have come up with:
# get all unique tags
all_tags = list(set([item for sublist in df.tags.tolist() for item in sublist]))
# get a count of each tag
tag_views = {tag: df[df.tags.map(lambda x: tag in x)].views.sum() for tag in all_tags}
This approach is rather slow for a large dataset. Is there a more efficient way (perhaps using the builtin groupby function) of doing this?