Group by without losing a column

Question

I'm trying to get each states' election winner from a dataset which has the votes for every county in the 2020 presidential elections.

I started off with this

    data = pd.read_csv('..\Data\president_county_candidate.csv', lineterminator='\n')
    
    group = data.groupby(
        ['state', 'candidate'], as_index=False
    ).agg(
            totalVoteSum=('total_votes', 'sum')
    )
    group

The result is currently this click

What I would like to have now is a list of states with the winning candidate, e.g.

State	Candidate	Votes
Alaska	Donald Trump	1441168
Alabama	Donald Trump	189892

I tried this:

group = group.groupby(
    ['state'], as_index=False
).agg(
        winner=('totalVoteSum', 'max')
)
group

Which gives the correct result but skips the candidate column.

How do I get the column to remain without grouping by it which obviously gives a wrong result?

score 1 · Answer 1 · answered Nov 20 '21 at 13:34

1

This works, I'm not sure how though:

idx = group.groupby(['state'])['totalVoteSum'].transform(max) == group['totalVoteSum']

group[idx]

Thanks RJ Andriaansen

answered Nov 20 '21 at 13:34

sarcasm

13
5

Group by without losing a column

1 Answers1