0

Disclaimer: This is the first time I've worked with Pandas.

I have a data frame that looks something like this (no hierarchal index):

State Party Pct
Alaska DEM 46.67
Alaska IND 37.29
Alaska LIB 20
Alaska REP 10
Arizona DEM 46.5
Arizona REP 49.5
California DEM 42.65
California REP 42.48
etc.

I would like the maximum percentage for each state, and the associated party for that state. So the output would look like for example:

State Party Pct
Alaska DEM 46.67
Arizona REP 49.5
California DEM 42.65
etc.

I have tried doing this:

# Isolate desired columns
df_h = pd.DataFrame(house_polls, columns=['state','party','pct'])
df_s = pd.DataFrame(senate_polls, columns=['state','party','pct'])

# Take the mean percent for state and party groups, and sort by said groups
cols = ["state", "party"]
df_h_avg = df_h.groupby(cols, as_index=False).mean(numeric_only=True).sort_values(by=cols)
df_s_avg = df_s.groupby(cols, as_index=False).mean(numeric_only=True).sort_values(by=cols)

# Take the maximum percent for each state
cols1 = ["state"]
house_results = df_h_avg.groupby(by=cols1).max(numeric_only=True)
senate_results = df_s_avg.groupby(by=cols1).max(numeric_only=True)

# Concatenate the associated party??

But it does not return the associated party.

State Percentage
Alaska 46.67
Arizona 46.5
California 42.65
etc.
lordcarrot
  • 11
  • 1
  • 1
    This is a duplicate, you can check this link https://stackoverflow.com/questions/15705630/get-the-rows-which-have-the-max-value-in-groups-using-groupby – Joshua Dec 06 '22 at 00:41
  • Post linked by Joshua has several nice solutions to your problem. I personally prefer sort + groupby-last, but either will work – Alleo Dec 06 '22 at 03:07

1 Answers1

0

you can try this one, the comment is added to the code.

#test data setup
df_raw="""
State,Party,Pct
Alaska,DEM,46.67
Alaska,IND,37.29
Alaska,LIB,20
Alaska,REP ,10
Arizona,DEM ,46.5
Arizona,REP,49.5
California,DEM ,42.65
California,REP ,42.48
"""
import pandas as pd
from io import StringIO
df=pd.read_csv(StringIO(df_raw))
# group by State, and sort by Pct and get first row only
result=[e.sort_values('Pct',ascending=False)[:1] for e_name,e in df.groupby('State')]
merged_df = pd.concat(result)
merged_df.head(10) # store the results

result:

State       Party   Pct
Alaska      DEM     46.67
Arizona     REP     49.50
California  DEM     42.65

Summary: group by State , then sort by Pct and grab the first row only.

simpleApp
  • 2,885
  • 2
  • 10
  • 19
  • You could do that a bit neater: `df.sort_values("Pct", ascending=False).groupby("State", as_index=False).head(1)` – Timus Dec 06 '22 at 10:49
  • the `head(1)` is an abstraction for sure, the code behind the head is `return self.iloc[:n]` – simpleApp Dec 09 '22 at 14:18
  • I did use `.head(1)` because I can't use `.iloc[:1]` on a groupby object. My point was more that you can do it without going through a list and having to `concat`. – Timus Dec 09 '22 at 14:24