1

I have this data frame. How can I find the 3 most repeated number in column b?

import pandas as pd
import numpy as np
df = pd.DataFrame({"a": [1,2,2,4,2,3], "b": [np.nan, np.nan, '2,3', 3, '3,5,1',2]})

I guess the answer should be 3,2,5 or 3,2,1

Wasif
  • 14,755
  • 3
  • 14
  • 34
sam_sam
  • 449
  • 1
  • 5
  • 16

2 Answers2

1

Use combination of pandas and python collections.Counter

from collections import Counter

a = list(dict(Counter(df.b.dropna().astype(str).str.split(',').sum()).most_common(3))
                                   .keys())

In [132]: a
Out[132]: ['3', '2', '5']
Andy L.
  • 24,909
  • 4
  • 17
  • 29
0

split the column b around the delimiter ,, then use explode to transform each element in list like to rows, finally use value_counts + head to get the top 3 repeated elements:

df['b'].dropna().astype(str).str.split(',')\
       .explode().value_counts().head(3).index.tolist()

explode is available in pandas version >= 0.25, for pandas version < 0.25 use:

pd.value_counts(np.hstack(df['b'].dropna().astype(str).str.split(','))).head(3).index.tolist()

['3', '2', '5']
Shubham Sharma
  • 68,127
  • 6
  • 24
  • 53