How to create a subset of data for most common

Question

enter image description here

I need some help creating a subset of data. I'm sure this is a simple problem but I can't figure it out. For example, in the table, I need to create a subset of the data that includes the presidential winner from each state. So for Alabama for example, I would need the line for Donald J Trump since he got the highest proportion of votes (candidate votes/ total votes). I would need to isolate the winners from every state.

State          Candidate          candidatevotes          totalvotes
Alabama         D J Trump          1318255                 2123372
Alabama         Clinton            729547                  2123372
Alabama         Gary Johnson       44467                   2123372
Alabama         Other              21712                   2123372

However, I don't know how to isolate the winner from each state. I have tried using using

data_sub <- filename[candidatevotes/totalvotes > .5]

but I know that since there are 3rd party candidates, not every winner from each state will win with majority votes. I have attached a picture for reference. Thank you in advance!

score 0 · Answer 1 · answered Nov 01 '20 at 21:37

0

We can do a group by 'State' and filter the max proportion row for each 'State'

library(dplyr)
 df1 %>%
     mutate(prop = candidatevotes/totalvotes) %>%
     group_by(State) %>%
     filter(prop > .5, prop == max(prop))

answered Nov 01 '20 at 21:37

akrun

874,273
37
540
662

score 0 · Accepted Answer · answered Nov 01 '20 at 21:38

I just manipulated your data a little bit to demostrate how the problem could be solved:

# Just changed the last two states to Texas so that you get a two line result (not just one)
election <- data.frame(State = c("Alabama", "Alabama", "Texas", "Texas"),
                       Candidate = c("D J Trump", "Clinton", "Gary Johnson", "Other"),
                       candidatevotes = c(1318255, 729547, 44467, 21712),
                       totalvotes = c(2123372, 2123372, 2123372, 2123372))
# need library
library(dplyr)

election %>% 
  # group by the variable you want the max value for (State)
  dplyr::group_by(State) %>% 
  # get the lines with maximum candidatevotes for each State
  dplyr::filter(candidatevotes == max(candidatevotes))

Thank you so much! I spent hours trying to figure this out! – missatomicbomb Nov 01 '20 at 22:00 — missatomicbomb, Nov 01 '20 at 22:00

How to create a subset of data for most common

2 Answers2