data mining: subset based on maximum criteria of several observations

Question

Consider the example data

Zip_Code <- c(1,1,1,2,2,2,3,3,3,3,4,4)
Political_pref <- c('A','A','B','A','B','B','A','A','B','B','A','A')
income <- c(60,120,100,90,80,60,100,90,200,200,90,110)
df1 <- data.frame(Zip_Code, Political_pref, income)

I want to group_by each $Zip_code and obtain the maximum $income based on each $Political_pref factor.

The desired output is a df which has 8obs of 3 variables. That contains, 2 obs for each $Zip_code (an A and B for each) which had the greatest income

I am playing with dplyr, but happy for a solution using any package (possibly with data.table)

library(dplyr) 
df2 <- df1 %>%
  group_by(Zip_Code) %>% 
  filter(....)

You can group by zip_code and political pref and summarise it with the max function? `df %>% group_by(zip_code, political_pref) %>% summarise(m = max(income))` — Mostafa90, Feb 16 '17 at 13:18
also useful: http://stackoverflow.com/questions/29657753/can-summarise-in-dplyr-not-drop-other-columns-in-my-data-frame — Cath, Feb 16 '17 at 14:11

akrun · Accepted Answer · 2017-02-16T13:36:07.050

1

We can use slice with which.max

library(dplyr)
df1 %>%
   group_by(Zip_Code, Political_pref) %>%
   slice(which.max(income))

edited Feb 16 '17 at 13:36

answered Feb 16 '17 at 13:16

akrun

874,273
37
540
662

data mining: subset based on maximum criteria of several observations

1 Answers1