0

enter image description here

I need some help creating a subset of data. I'm sure this is a simple problem but I can't figure it out. For example, in the table, I need to create a subset of the data that includes the presidential winner from each state. So for Alabama for example, I would need the line for Donald J Trump since he got the highest proportion of votes (candidate votes/ total votes). I would need to isolate the winners from every state.

State          Candidate          candidatevotes          totalvotes
Alabama         D J Trump          1318255                 2123372
Alabama         Clinton            729547                  2123372
Alabama         Gary Johnson       44467                   2123372
Alabama         Other              21712                   2123372

However, I don't know how to isolate the winner from each state. I have tried using using

data_sub <- filename[candidatevotes/totalvotes > .5] 

but I know that since there are 3rd party candidates, not every winner from each state will win with majority votes. I have attached a picture for reference. Thank you in advance!

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87

2 Answers2

0

We can do a group by 'State' and filter the max proportion row for each 'State'

library(dplyr)
 df1 %>%
     mutate(prop = candidatevotes/totalvotes) %>%
     group_by(State) %>%
     filter(prop > .5, prop == max(prop))  
akrun
  • 874,273
  • 37
  • 540
  • 662
0

I just manipulated your data a little bit to demostrate how the problem could be solved:

# Just changed the last two states to Texas so that you get a two line result (not just one)
election <- data.frame(State = c("Alabama", "Alabama", "Texas", "Texas"),
                       Candidate = c("D J Trump", "Clinton", "Gary Johnson", "Other"),
                       candidatevotes = c(1318255, 729547, 44467, 21712),
                       totalvotes = c(2123372, 2123372, 2123372, 2123372))
# need library
library(dplyr)

election %>% 
  # group by the variable you want the max value for (State)
  dplyr::group_by(State) %>% 
  # get the lines with maximum candidatevotes for each State
  dplyr::filter(candidatevotes == max(candidatevotes))
DPH
  • 4,244
  • 1
  • 8
  • 18