Pandas: for groupby value_counts, return the row(s) with the maximum count

Question

I want to do almost the same thing as this question.

However, the approach in the accepted answer by @jezrael takes way too long based on my dataset -- I have ~300k rows in the original dataframe, and it takes a few minutes to run the nlargest(1) command. Furthermore, I tried it on a head(1000) limited dataframe, and didn't get only 1 row for each within the value_count -- I got exactly the same Series back as the value_counts.

In my own words: Basically, my dataset has two columns like this:

Session Rating
A       Positive
A       Positive
A       Positive
A       Negative
B       Negative
B       Negative
C       Positive
C       Negative

Using counts = df.groupby('Session')['Rating'].value_counts() I get a Series object like this:

Session Rating
A       Positive  3
        Negative  1
B       Negative  2
C       Positive  1
        Negative  1

How do I get a dataframe where just the Rating with the max count is included? And in cases where there are multiple maxes (such as C), I would like to exclude that one from the returned table.

https://stackoverflow.com/a/32192310/2027457 Checkt his way for faster (numpy here will help you better than pandas) — n1tk, Jul 05 '18 at 22:38
@0709 I tried the accepted answer on the same dataset he generated, and it didn't give me the same results-- the shape was equal to k for some reason. — manestay, Jul 05 '18 at 22:59

score 7 · Answer 1 · answered Jul 06 '18 at 00:00

7

I think you want something like this.

df.groupby('Session')['Rating'].apply(lambda x: x.value_counts().head(1))

Output:

Session          
A        Positive    3
B        Negative    2
C        Negative    1
Name: Rating, dtype: int64

answered Jul 06 '18 at 00:00

Scott Boston

147,308
15
139
187

There must be another way without using apply ? I am having the same challenge on 1M rows... – Xavier Bourret Sicotte Oct 05 '18 at 04:13
2

@XavierBourretSicotte Try this `df.groupby('Session')['Rating'].value_counts().groupby(level=0).head(1)` – Scott Boston Oct 05 '18 at 14:32
@ScottBoston, could you please explain what does level=0 and head(1) do? – Reihan_amn Mar 18 '22 at 00:26
1

@Reihan_amn Groupby level=0 is grouping by the dataframe index level = 0 if multiindex, if not, it just groups by index. head(1) gets the first records. So, in each group get only the first record. – Scott Boston Mar 18 '22 at 01:55

Pandas: for groupby value_counts, return the row(s) with the maximum count

1 Answers1

Linked