0

Consider the example data

Zip_Code <- c(1,1,1,2,2,2,3,3,3,3,4,4)
Political_pref <- c('A','A','B','A','B','B','A','A','B','B','A','A')
income <- c(60,120,100,90,80,60,100,90,200,200,90,110)
df1 <- data.frame(Zip_Code, Political_pref, income)

I want to group_by each $Zip_code and obtain the maximum $income based on each $Political_pref factor.

The desired output is a df which has 8obs of 3 variables. That contains, 2 obs for each $Zip_code (an A and B for each) which had the greatest income

I am playing with dplyr, but happy for a solution using any package (possibly with data.table)

library(dplyr) 
df2 <- df1 %>%
  group_by(Zip_Code) %>% 
  filter(....)
Frank
  • 66,179
  • 8
  • 96
  • 180
user08041991
  • 617
  • 8
  • 20
  • You can group by zip_code and political pref and summarise it with the max function? `df %>% group_by(zip_code, political_pref) %>% summarise(m = max(income))` – Mostafa90 Feb 16 '17 at 13:18
  • 1
    `aggregate(income~Zip_Code+Political_pref, df1, max)` ? – Cath Feb 16 '17 at 13:22
  • also useful: http://stackoverflow.com/questions/29657753/can-summarise-in-dplyr-not-drop-other-columns-in-my-data-frame – Cath Feb 16 '17 at 14:11

1 Answers1

1

We can use slice with which.max

library(dplyr)
df1 %>%
   group_by(Zip_Code, Political_pref) %>%
   slice(which.max(income))
akrun
  • 874,273
  • 37
  • 540
  • 662