How to find rows with greatest column value, for each index?

Question

I am working on a dataset from the Federal Elections Commission on US House of Representative campaign spending in the 2018 Midterms. Because it contains all candidates (including those who lost primaries) and I am only interested in general election Democratic candidates, I'm trying to reduce the dataset to nominated Democrats. I've assumed that these will be the Democrats with the highest spending in each district.

The starting dataset is here: https://www.fec.gov/data/candidates/house/?election_year=2018&election_full=True&party=DEM&is_active_candidate=true

I have created a new column, state_dist, which is a combination of state and district with the latter turned to int, then str. This was then sorted by state and district to the following:

I also have a stacked version, stacking by state and district:

I have tried borrowing code from Bharath's answer here. It creates the following:

The problem is that max_in_district appears to return the right values, but I need the candidate name and other variables as well, to attach this to my main project dataframe. I tried adding name in max_in_district_2, but this doesn't return the result I'm looking for.

To be clear, I want the full row in each "state_dist" value which has the highest value in the "disbursements" column. How should I get this?

Thank you in advance.

score 1 · Answer 1 · answered Apr 01 '21 at 20:03

1

Find the idxmax and then merge the data back to the original frame:

df.set_index('name').groupby(['state_dist'], as_index=False)['disbursements'].idxmax()\
.merge(df, right_on=['name', 'state_dist'], 
       left_on=['disbursements', 'state_dist'])

answered Apr 01 '21 at 20:03

It_is_Chris

13,504
2
23
41

That solved it. This was a huge help, so thanks a ton. – Demosthenes Apr 01 '21 at 23:31

score 1 · Answer 2 · answered Apr 01 '21 at 20:03

1

You can do:

df.loc[
    df.groupby("state_dist", as_index=False)["disbursements"].idxmax()[
        "disbursements"
    ]
].to_csv("out.csv")

This creates out.csv:

answered Apr 01 '21 at 20:03

Andrej Kesely

168,389
15
48
91

1

Thanks, looks like this works too. Much appreciated! – Demosthenes Apr 01 '21 at 23:32

How to find rows with greatest column value, for each index?

2 Answers2