Groupby maximum in Pandas dataframe while retaining other columns

Question

Disclaimer: This is the first time I've worked with Pandas.

I have a data frame that looks something like this (no hierarchal index):

State	Party	Pct
Alaska	DEM	46.67
Alaska	IND	37.29
Alaska	LIB	20
Alaska	REP	10
Arizona	DEM	46.5
Arizona	REP	49.5
California	DEM	42.65
California	REP	42.48
etc.

I would like the maximum percentage for each state, and the associated party for that state. So the output would look like for example:

State	Party	Pct
Alaska	DEM	46.67
Arizona	REP	49.5
California	DEM	42.65
etc.

I have tried doing this:

# Isolate desired columns
df_h = pd.DataFrame(house_polls, columns=['state','party','pct'])
df_s = pd.DataFrame(senate_polls, columns=['state','party','pct'])

# Take the mean percent for state and party groups, and sort by said groups
cols = ["state", "party"]
df_h_avg = df_h.groupby(cols, as_index=False).mean(numeric_only=True).sort_values(by=cols)
df_s_avg = df_s.groupby(cols, as_index=False).mean(numeric_only=True).sort_values(by=cols)

# Take the maximum percent for each state
cols1 = ["state"]
house_results = df_h_avg.groupby(by=cols1).max(numeric_only=True)
senate_results = df_s_avg.groupby(by=cols1).max(numeric_only=True)

# Concatenate the associated party??

But it does not return the associated party.

State	Percentage
Alaska	46.67
Arizona	46.5
California	42.65
etc.

This is a duplicate, you can check this link https://stackoverflow.com/questions/15705630/get-the-rows-which-have-the-max-value-in-groups-using-groupby — Joshua, Dec 06 '22 at 00:41
Post linked by Joshua has several nice solutions to your problem. I personally prefer sort + groupby-last, but either will work — Alleo, Dec 06 '22 at 03:07

score 0 · Answer 1 · answered Dec 06 '22 at 02:47

0

you can try this one, the comment is added to the code.

#test data setup
df_raw="""
State,Party,Pct
Alaska,DEM,46.67
Alaska,IND,37.29
Alaska,LIB,20
Alaska,REP ,10
Arizona,DEM ,46.5
Arizona,REP,49.5
California,DEM ,42.65
California,REP ,42.48
"""
import pandas as pd
from io import StringIO
df=pd.read_csv(StringIO(df_raw))
# group by State, and sort by Pct and get first row only
result=[e.sort_values('Pct',ascending=False)[:1] for e_name,e in df.groupby('State')]
merged_df = pd.concat(result)
merged_df.head(10) # store the results

result:

State       Party   Pct
Alaska      DEM     46.67
Arizona     REP     49.50
California  DEM     42.65

Summary: group by State , then sort by Pct and grab the first row only.

answered Dec 06 '22 at 02:47

simpleApp

2,885
2
10
19

You could do that a bit neater: `df.sort_values("Pct", ascending=False).groupby("State", as_index=False).head(1)` – Timus Dec 06 '22 at 10:49
the `head(1)` is an abstraction for sure, the code behind the head is `return self.iloc[:n]` – simpleApp Dec 09 '22 at 14:18
I did use `.head(1)` because I can't use `.iloc[:1]` on a groupby object. My point was more that you can do it without going through a list and having to `concat`. – Timus Dec 09 '22 at 14:24

Groupby maximum in Pandas dataframe while retaining other columns

1 Answers1