108

i have this dataframe:

0 name data
1 alex asd
2 helen sdd
3 alex dss
4 helen sdsd
5 john sdadd

so i am trying to get the most frequent value or values(in this case its values) so what i do is:

dataframe['name'].value_counts().idxmax()

but it returns only the value: Alex even if it Helen appears two times as well.

aleale
  • 1,231
  • 2
  • 8
  • 11

18 Answers18

127

By using mode

df.name.mode()
Out[712]: 
0     alex
1    helen
dtype: object
BENY
  • 317,841
  • 20
  • 164
  • 234
  • 1
    Hmmm, I have seen you using mode earlier :) – Vaishali Feb 02 '18 at 21:05
  • 2
    @Vaishali yep, that is from scipy.mode , which will return the mode and the count , for pd.mode, it one return the value :-) – BENY Feb 02 '18 at 21:12
  • is there a way to get both the most frequent value and its indices with one query? – user2348209 Aug 28 '23 at 16:01
  • @user2348209 you can but not recommend `df.reset_index().groupby(df['name'])['index'].agg(list).loc[lambda x : x.str.len() ==x.str.len().max()] ` – BENY Aug 28 '23 at 17:23
88

To get the n most frequent values, just subset .value_counts() and grab the index:

# get top 10 most frequent names
n = 10
dataframe['name'].value_counts()[:n].index.tolist()
Jared Wilber
  • 6,038
  • 1
  • 32
  • 35
  • 1
    What exactly does adding .index does? Why can't I leave it till [:n]? – user1953366 Apr 28 '19 at 07:10
  • 2
    The returned data structure will have the `name` values stored in the index, with their respective counts stored as the value. So if you didn't use index, you'd get a list of the most frequent counts, not the associated `name`. – Jared Wilber Apr 28 '19 at 18:15
18

You could try argmax like this:

dataframe['name'].value_counts().argmax() Out[13]: 'alex'

The value_counts will return a count object of pandas.core.series.Series and argmax could be used to achieve the key of max values.

Lunar_one
  • 337
  • 3
  • 4
12

It will give top five most common names:

df['name'].value_counts().nlargest(5)
Syscall
  • 19,327
  • 10
  • 37
  • 52
11
df['name'].value_counts()[:5].sort_values(ascending=False)

The value_counts will return a count object of pandas.core.series.Series and sort_values(ascending=False) will get you the highest values first.

Taie
  • 1,021
  • 16
  • 29
  • 1
    While this code may answer the question, providing additional context regarding why and/or how this code answers the question improves its long-term value. – xiawi Sep 11 '19 at 08:57
  • `value_counts()` already returns a sort in descending order, so calling `sort_values()` is unnecessary. See [documentation](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.value_counts.html). – Matt VanEseltine Oct 20 '20 at 21:02
10

Use:

df['name'].mode()

or

df['name'].value_counts().idxmax()
Mohit Mehlawat
  • 344
  • 3
  • 6
8

You can use this to get a perfect count, it calculates the mode a particular column

df['name'].value_counts()
paul okoduwa
  • 81
  • 1
  • 1
7

Here's one way:

df['name'].value_counts()[df['name'].value_counts() == df['name'].value_counts().max()]

which prints:

helen    2
alex     2
Name: name, dtype: int64
pault
  • 41,343
  • 15
  • 107
  • 149
5

Not Obvious, But Fast

f, u = pd.factorize(df.name.values)
counts = np.bincount(f)
u[counts == counts.max()]

array(['alex', 'helen'], dtype=object)
piRSquared
  • 285,575
  • 57
  • 475
  • 624
5

Simply use this..

dataframe['name'].value_counts().nlargest(n)

The functions for frequencies largest and smallest are:

  • nlargest() for mostfrequent 'n' values
  • nsmallest() for least frequent 'n' values
William Prigol Lopes
  • 1,803
  • 14
  • 31
avineet07
  • 51
  • 1
  • 5
4

to get top 5:

dataframe['name'].value_counts()[0:5]
Naomi Fridman
  • 2,095
  • 2
  • 25
  • 36
  • 2
    I actually like this answer, but there is one issue. Doing this just returns the frequency, not the label. Fix this by using ```dataframe['name'].value_counts().keys()[0:5]``` instead. –  Jul 25 '19 at 17:32
2

You could use .apply and pd.value_counts to get a count the occurrence of all the names in the name column.

dataframe['name'].apply(pd.value_counts)
Brian
  • 2,163
  • 1
  • 14
  • 26
2

To get the top five most common names:

dataframe['name'].value_counts().head()
pedro_bb7
  • 1,601
  • 3
  • 12
  • 28
2

my best solution to get the first is

 df['my_column'].value_counts().sort_values(ascending=False).argmax()
venergiac
  • 7,469
  • 2
  • 48
  • 70
2

I had a similar issue best most compact answer to get lets say the top n (5 is default) most frequent values is:

df["column_name"].value_counts().head(n)
KZiovas
  • 3,491
  • 3
  • 26
  • 47
2

Identifying the top 5, for example, using value_counts

top5 = df['column'].value_counts()

Listing contents of 'top_5'

top5[:5]
1

n is used to get the number of top frequent used items

n = 2

a=dataframe['name'].value_counts()[:n].index.tolist()

dataframe["name"].value_counts()[a]
Maylo
  • 572
  • 5
  • 16
0

Getting top 5 most common lastname pandas:

df['name'].apply(lambda name: name.split()[-1]).value_counts()[:5]
General Grievance
  • 4,555
  • 31
  • 31
  • 45