Pandas: how to parse values from column

Question

I have a somewhat large dataframe, formatted something like this:

colA  colB
1     c, d
2     d, e, f
3     e, d, a

I want to get a dictionary that counts instances of unique values in colB, like:

a: 1
c: 1
d: 3
e: 2
f: 1

My naive solution would be to iterate over every row of colB, split that, then use a Counter: my_counter[current_colB_object] += 1.

However, this answer strongly discourages iterating over dataframes, especially (like in my case) large ones.

What would be the preferred way of doing this?

score 1 · Answer 1 · answered Aug 24 '21 at 16:10

Try with explode and value_counts:

>>> df["colB"].str.split(", ").explode().value_counts().to_dict()
{'d': 3, 'e': 2, 'c': 1, 'f': 1, 'a': 1}

Input `df`:

df = pd.DataFrame({"colA": [1, 2, 3],
                   "colB": ["c, d", "d, e, f", "e, d, a"]
                   })

>>> df
   colA     colB
0     1     c, d
1     2  d, e, f
2     3  e, d, a

tozCSS · Answer 2 · 2021-08-25T16:38:28.870

1

Probably faster than the other answer -- you might want to time both yourself on a sample of your data; see df.sample().

from collections import Counter
cnt = Counter()
df.colB.str.split(', ').apply(cnt.update)
dict(cnt)

Outputs

{'c': 1, 'd': 3, 'e': 2, 'f': 1, 'a': 1}

edited Aug 25 '21 at 16:38

answered Aug 24 '21 at 16:18

tozCSS

5,487
2
34
31

Pandas: how to parse values from column

2 Answers2

Input df:

Input `df`: