Pandas get the most frequent values of a column

Question

i have this dataframe:

0 name data
1 alex asd
2 helen sdd
3 alex dss
4 helen sdsd
5 john sdadd

so i am trying to get the most frequent value or values(in this case its values) so what i do is:

dataframe['name'].value_counts().idxmax()

but it returns only the value: Alex even if it Helen appears two times as well.

score 127 · Accepted Answer · answered Feb 02 '18 at 20:23

127

By using mode

df.name.mode()
Out[712]: 
0     alex
1    helen
dtype: object

answered Feb 02 '18 at 20:23

BENY

317,841
20
164
234

1

Hmmm, I have seen you using mode earlier :) – Vaishali Feb 02 '18 at 21:05
2

@Vaishali yep, that is from scipy.mode , which will return the mode and the count , for pd.mode, it one return the value :-) – BENY Feb 02 '18 at 21:12
is there a way to get both the most frequent value and its indices with one query? – user2348209 Aug 28 '23 at 16:01
@user2348209 you can but not recommend `df.reset_index().groupby(df['name'])['index'].agg(list).loc[lambda x : x.str.len() ==x.str.len().max()] ` – BENY Aug 28 '23 at 17:23

score 88 · Answer 2 · answered Apr 28 '19 at 06:47

88

To get the n most frequent values, just subset .value_counts() and grab the index:

# get top 10 most frequent names
n = 10
dataframe['name'].value_counts()[:n].index.tolist()

answered Apr 28 '19 at 06:47

Jared Wilber

6,038
1
32
35

1

What exactly does adding .index does? Why can't I leave it till [:n]? – user1953366 Apr 28 '19 at 07:10
2

The returned data structure will have the `name` values stored in the index, with their respective counts stored as the value. So if you didn't use index, you'd get a list of the most frequent counts, not the associated `name`. – Jared Wilber Apr 28 '19 at 18:15

score 18 · Answer 3 · answered Jun 27 '18 at 02:57

18

You could try argmax like this:

dataframe['name'].value_counts().argmax() Out[13]: 'alex'

The value_counts will return a count object of pandas.core.series.Series and argmax could be used to achieve the key of max values.

answered Jun 27 '18 at 02:57

Lunar_one

337
3
4

2

`argmax` is deprecated for `idmax` – Bhoomtawath Plinsut Nov 10 '18 at 13:46
5

Just a small typo correction: is not ```idmax```, but ```idxmax``` – ralvarez Jul 05 '19 at 08:08

score 12 · Answer 4 · edited Jan 21 '22 at 08:07

12

It will give top five most common names:

df['name'].value_counts().nlargest(5)

edited Jan 21 '22 at 08:07

Syscall

19,327
10
37
52

answered Jan 21 '22 at 07:25

Sandhya Krishnan

196
3
8

Taie · Answer 5 · 2019-09-11T09:03:58.067

11

df['name'].value_counts()[:5].sort_values(ascending=False)

The value_counts will return a count object of pandas.core.series.Series and sort_values(ascending=False) will get you the highest values first.

edited Sep 11 '19 at 09:03

answered Sep 11 '19 at 08:32

Taie

1,021
16
29

1

While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. – xiawi Sep 11 '19 at 08:57
`value_counts()` already returns a sort in descending order, so calling `sort_values()` is unnecessary. See [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.value_counts.html). – Matt VanEseltine Oct 20 '20 at 21:02

score 10 · Answer 6 · answered Jul 06 '20 at 09:15

10

Use:

df['name'].mode()

or

df['name'].value_counts().idxmax()

answered Jul 06 '20 at 09:15

Mohit Mehlawat

344
3
6

score 8 · Answer 7 · answered Aug 15 '18 at 05:18

8

You can use this to get a perfect count, it calculates the mode a particular column

df['name'].value_counts()

answered Aug 15 '18 at 05:18

paul okoduwa

81
1
1

score 7 · Answer 8 · answered Feb 02 '18 at 20:22

7

Here's one way:

df['name'].value_counts()[df['name'].value_counts() == df['name'].value_counts().max()]

which prints:

helen    2
alex     2
Name: name, dtype: int64

answered Feb 02 '18 at 20:22

pault

41,343
15
107
149

score 5 · Answer 9 · answered Feb 02 '18 at 20:34

5

Not Obvious, But Fast

f, u = pd.factorize(df.name.values)
counts = np.bincount(f)
u[counts == counts.max()]

array(['alex', 'helen'], dtype=object)

answered Feb 02 '18 at 20:34

piRSquared

285,575
57
475
624

For numeric data, this was slightly slower for me :) Like 5% – The Unfun Cat Nov 14 '19 at 10:39

score 5 · Answer 10 · edited May 03 '20 at 07:01

5

Simply use this..

dataframe['name'].value_counts().nlargest(n)

The functions for frequencies largest and smallest are:

nlargest() for mostfrequent 'n' values
nsmallest() for least frequent 'n' values

edited May 03 '20 at 07:01

William Prigol Lopes

1,803
14
31

answered May 02 '20 at 20:00

avineet07

51
1
5

score 4 · Answer 11 · answered Jul 02 '19 at 09:03

4

to get top 5:

dataframe['name'].value_counts()[0:5]

answered Jul 02 '19 at 09:03

Naomi Fridman

2,095
2
25
36

2

I actually like this answer, but there is one issue. Doing this just returns the frequency, not the label. Fix this by using ```dataframe['name'].value_counts().keys()[0:5]``` instead. – Jul 25 '19 at 17:32

score 2 · Answer 12 · answered Feb 02 '18 at 20:24

2

You could use .apply and pd.value_counts to get a count the occurrence of all the names in the name column.

dataframe['name'].apply(pd.value_counts)

answered Feb 02 '18 at 20:24

Brian

2,163
1
14
26

score 2 · Answer 13 · answered Jul 30 '19 at 05:41

2

To get the top five most common names:

dataframe['name'].value_counts().head()

answered Jul 30 '19 at 05:41

pedro_bb7

1,601
3
12
28

score 2 · Answer 14 · answered Jan 30 '20 at 15:13

2

my best solution to get the first is

 df['my_column'].value_counts().sort_values(ascending=False).argmax()

answered Jan 30 '20 at 15:13

venergiac

7,469
2
48
70

score 2 · Answer 15 · answered Mar 12 '21 at 14:50

2

I had a similar issue best most compact answer to get lets say the top n (5 is default) most frequent values is:

df["column_name"].value_counts().head(n)

answered Mar 12 '21 at 14:50

KZiovas

3,491
3
26
47

score 2 · Answer 16 · answered Jun 18 '21 at 16:53

2

Identifying the top 5, for example, using value_counts

top5 = df['column'].value_counts()

Listing contents of 'top_5'

top5[:5]

answered Jun 18 '21 at 16:53

Victor Senna

33
4

1

The one liner for this is: `df['column'].value_counts()[:5]` – Duc Hiep Hoang Jun 22 '21 at 16:34
1

The above may give you a `KeyError`. The more general way is `top5.keys()[:5]` The one-liner being `df['column'].value_counts().keys()[:5]` – Nirjhor Chakraborty Jul 02 '21 at 21:30

score 1 · Answer 17 · edited Dec 16 '20 at 14:34

1

n is used to get the number of top frequent used items

n = 2

a=dataframe['name'].value_counts()[:n].index.tolist()

dataframe["name"].value_counts()[a]

edited Dec 16 '20 at 14:34

Maylo

572
5
16

answered Dec 16 '20 at 14:10

Hassan Butt

21
1

score 0 · Answer 18 · edited Aug 11 '21 at 19:36

0

Getting top 5 most common lastname pandas:

df['name'].apply(lambda name: name.split()[-1]).value_counts()[:5]

edited Aug 11 '21 at 19:36

General Grievance

4,555
31
31
45

answered Aug 11 '21 at 15:34

Alireza

1

Pandas get the most frequent values of a column

18 Answers18

n is used to get the number of top frequent used items

Linked

Related