Filtering within dplyr group_by so that combination of rows matching certain conditions remain

Question

I've got a data table that matches this structure:

address,zip_code,date,mailout
2 Lafayette St,90210,06/12/10,FALSE
2 Lafayette St,90210,04/01/12,FALSE
Higgens Square,62561,02/12/10,FALSE
Higgens Square,62561,28/03/13,TRUE
55 The Wren,91234,23/08/18,TRUE
55 The Wren,91234,19/09/13,FALSE
9A Sylvan Road,54332,16/11/10,TRUE
9A Sylvan Road,54332,31/01/17,FALSE

I'm trying to use dplyr's group_by to find situations where there has been at least one TRUE and at least one FALSE mailout for a given address/zip code grouping. The result I am looking for would be this:

address,zip_code,date,mailout
Higgens Square,62561,02/12/10,FALSE
Higgens Square,62561,28/03/13,TRUE
55 The Wren,91234,23/08/18,TRUE
55 The Wren,91234,19/09/13,FALSE
9A Sylvan Road,54332,16/11/10,TRUE
9A Sylvan Road,54332,31/01/17,FALSE

When I try

df %>% group_by(address, zip_code) %>% filter(mailout == TRUE | mailout == FALSE)

I get all rows returned.

When I try

df %>% group_by(address, zip_code) %>% filter(mailout == TRUE & mailout == FALSE)

I get no results.

score 2 · Accepted Answer · answered Aug 21 '20 at 08:38

Use any :

library(dplyr)
df %>% group_by(address, zip_code) %>% filter(any(mailout) && any(!mailout))

#  address        zip_code date     mailout
#  <chr>             <int> <chr>    <lgl>  
#1 Higgens Square    62561 02/12/10 FALSE  
#2 Higgens Square    62561 28/03/13 TRUE   
#3 55 The Wren       91234 23/08/18 TRUE   
#4 55 The Wren       91234 19/09/13 FALSE  
#5 9A Sylvan Road    54332 16/11/10 TRUE   
#6 9A Sylvan Road    54332 31/01/17 FALSE

Or all so that each group gets only one value using which you can decide whether to keep the group or not.

df %>% group_by(address, zip_code) %>% filter(all(c(TRUE, FALSE) %in%  mailout))

Darren Tsai · Answer 2 · 2020-08-22T07:47:20.497

You can use standard deviation (sd()) of each group to filter out those groups with only one value.

library(dplyr)

df %>%
  group_by(address, zip_code) %>%
  filter(sd(mailout) > 0)

#   address        zip_code date     mailout
#   <chr>             <int> <chr>    <lgl>  
# 1 Higgens Square    62561 02/12/10 FALSE  
# 2 Higgens Square    62561 28/03/13 TRUE   
# 3 55 The Wren       91234 23/08/18 TRUE   
# 4 55 The Wren       91234 19/09/13 FALSE  
# 5 9A Sylvan Road    54332 16/11/10 TRUE   
# 6 9A Sylvan Road    54332 31/01/17 FALSE

Filtering within dplyr group_by so that combination of rows matching certain conditions remain

2 Answers2