Disclaimer: This is the first time I've worked with Pandas.
I have a data frame that looks something like this (no hierarchal index):
State | Party | Pct |
---|---|---|
Alaska | DEM | 46.67 |
Alaska | IND | 37.29 |
Alaska | LIB | 20 |
Alaska | REP | 10 |
Arizona | DEM | 46.5 |
Arizona | REP | 49.5 |
California | DEM | 42.65 |
California | REP | 42.48 |
etc. |
I would like the maximum percentage for each state, and the associated party for that state. So the output would look like for example:
State | Party | Pct |
---|---|---|
Alaska | DEM | 46.67 |
Arizona | REP | 49.5 |
California | DEM | 42.65 |
etc. |
I have tried doing this:
# Isolate desired columns
df_h = pd.DataFrame(house_polls, columns=['state','party','pct'])
df_s = pd.DataFrame(senate_polls, columns=['state','party','pct'])
# Take the mean percent for state and party groups, and sort by said groups
cols = ["state", "party"]
df_h_avg = df_h.groupby(cols, as_index=False).mean(numeric_only=True).sort_values(by=cols)
df_s_avg = df_s.groupby(cols, as_index=False).mean(numeric_only=True).sort_values(by=cols)
# Take the maximum percent for each state
cols1 = ["state"]
house_results = df_h_avg.groupby(by=cols1).max(numeric_only=True)
senate_results = df_s_avg.groupby(by=cols1).max(numeric_only=True)
# Concatenate the associated party??
But it does not return the associated party.
State | Percentage |
---|---|
Alaska | 46.67 |
Arizona | 46.5 |
California | 42.65 |
etc. |