How to get the n most frequent or top values per column in python pandas?

Question

my dataframe looks like:

df:
    A   B
0   a   g
1   f   g
2   a   g
3   a   d
4   h   d
5   f   a

for top 2 most frequent values per column (n=2), the output should be:

top_df:
    A   B
0   a   g
1   f   d

Thank you

Possible duplicate: [this post](https://stackoverflow.com/questions/41825978) and [this post](https://stackoverflow.com/questions/20069009) — Bill Huang, Oct 20 '20 at 12:52

score 2 · Accepted Answer · answered Oct 20 '20 at 12:53

2

This should work

n = 2
df.apply(lambda x: pd.Series(x.value_counts().index[:n]))

answered Oct 20 '20 at 12:53

Myccha

961
1
11
20

score 0 · Answer 2 · answered Oct 20 '20 at 12:53

0

Something like this could help

maxes = dict()
for col in df.columns:
    frequencies = df[col].value_counts()
    # value counts automatically sorts, so just take the first 2
    max[col] = frequencies[:2]

answered Oct 20 '20 at 12:53

Andrew Pye

558
3
16

Piyush Sambhi · Answer 3 · 2020-10-20T13:09:34.777

SOLUTION:
To get the n most frequent values, just subset .value_counts() and grab the index:

import pandas as pd

df = pd.read_csv('test.csv')

# METHOD 1 : Lil lengthy and inefficient
top_dict = {}
n_freq_items = 2
top_dict['A'] = df.A.value_counts()[:n_freq_items].index.tolist()
top_dict['B'] = df.B.value_counts()[:n_freq_items].index.tolist()
top_df = pd.DataFrame(top_dict)

print(top_df)
df.apply(lambda x: pd.Series(x.value_counts()[:n_freq_items].index))

# METHOD 2 : Small, and better : taking this method from @myccha. As I found this better
top_df = df.apply(lambda x: pd.Series(x.value_counts()[:n_freq_items].index))
print(top_df)

INPUT DATA:

# test.csv
A,B
a,g
f,g
a,g
a,d
h,d
f,a

OUTPUT:

   A  B
0  a  g
1  f  d

NOTE: I have taken solution from @myccha, another answer from this post, as I found his answer more helpful, added it as METHOD 2.

How to get the n most frequent or top values per column in python pandas?

3 Answers3