I have this data frame. How can I find the 3 most repeated number in column b?
import pandas as pd
import numpy as np
df = pd.DataFrame({"a": [1,2,2,4,2,3], "b": [np.nan, np.nan, '2,3', 3, '3,5,1',2]})
I guess the answer should be 3,2,5 or 3,2,1
I have this data frame. How can I find the 3 most repeated number in column b?
import pandas as pd
import numpy as np
df = pd.DataFrame({"a": [1,2,2,4,2,3], "b": [np.nan, np.nan, '2,3', 3, '3,5,1',2]})
I guess the answer should be 3,2,5 or 3,2,1
Use combination of pandas and python collections.Counter
from collections import Counter
a = list(dict(Counter(df.b.dropna().astype(str).str.split(',').sum()).most_common(3))
.keys())
In [132]: a
Out[132]: ['3', '2', '5']
split
the column b
around the delimiter ,
, then use explode
to transform each element in list like to rows, finally use value_counts
+ head
to get the top 3 repeated elements:
df['b'].dropna().astype(str).str.split(',')\
.explode().value_counts().head(3).index.tolist()
explode
is available in pandas version >= 0.25
, for pandas version < 0.25
use:
pd.value_counts(np.hstack(df['b'].dropna().astype(str).str.split(','))).head(3).index.tolist()
['3', '2', '5']