how can I find the frequency?

Question

I have this data frame. How can I find the 3 most repeated number in column b?

import pandas as pd
import numpy as np
df = pd.DataFrame({"a": [1,2,2,4,2,3], "b": [np.nan, np.nan, '2,3', 3, '3,5,1',2]})

I guess the answer should be 3,2,5 or 3,2,1

what if np.nan is the most frequent, do you want that to be included? — Ehsan, Oct 09 '20 at 02:30

score 1 · Answer 1 · answered Oct 09 '20 at 02:39

Use combination of pandas and python collections.Counter

from collections import Counter

a = list(dict(Counter(df.b.dropna().astype(str).str.split(',').sum()).most_common(3))
                                   .keys())

In [132]: a
Out[132]: ['3', '2', '5']

Shubham Sharma · Accepted Answer · 2020-10-09T02:41:42.393

0

split the column b around the delimiter ,, then use explode to transform each element in list like to rows, finally use value_counts + head to get the top 3 repeated elements:

df['b'].dropna().astype(str).str.split(',')\
       .explode().value_counts().head(3).index.tolist()

explode is available in pandas version >= 0.25, for pandas version < 0.25 use:

pd.value_counts(np.hstack(df['b'].dropna().astype(str).str.split(','))).head(3).index.tolist()

['3', '2', '5']

edited Oct 09 '20 at 02:41

answered Oct 09 '20 at 02:07

Shubham Sharma

68,127
6
24
53

I am getting this error, AttributeError: 'Series' object has no attribute 'explode' – sam_sam Oct 09 '20 at 02:36
I think you are using the old pandas version, is it possible to upgrade pandas? – Shubham Sharma Oct 09 '20 at 02:37
yes, you are right. I was using a cloud system so ... it worked on my personal computer – sam_sam Oct 09 '20 at 02:41

how can I find the frequency?

2 Answers2