0

I'm trying to get each states' election winner from a dataset which has the votes for every county in the 2020 presidential elections.

I started off with this

    data = pd.read_csv('..\Data\president_county_candidate.csv', lineterminator='\n')
    
    group = data.groupby(
        ['state', 'candidate'], as_index=False
    ).agg(
            totalVoteSum=('total_votes', 'sum')
    )
    group

The result is currently this click

What I would like to have now is a list of states with the winning candidate, e.g.

State Candidate Votes
Alaska Donald Trump 1441168
Alabama Donald Trump 189892

I tried this:

group = group.groupby(
    ['state'], as_index=False
).agg(
        winner=('totalVoteSum', 'max')
)
group

Which gives the correct result but skips the candidate column.

How do I get the column to remain without grouping by it which obviously gives a wrong result?

sarcasm
  • 13
  • 5

1 Answers1

1

This works, I'm not sure how though:

idx = group.groupby(['state'])['totalVoteSum'].transform(max) == group['totalVoteSum']

group[idx]

Thanks RJ Andriaansen

sarcasm
  • 13
  • 5