dplyr filter based on conditions across and within column

Question

I'd like to validate survey responses, involving removing rows with NAs based on the condition within a column and across columns. Sample dataset below

col1 <- c("Yes", "Yes", "No", "No", NA)
col2 <- c("Yes", NA, "No", NA, NA)
col3 <- c("No", "Yes", "No", NA, NA)

dataset <- data.frame(col1, col2, col3)
dataset

The desired output involves filtering out all rows in col1, and then removing only the row with a Yes in col1 and NA in any other column. Desired output below `

  col1 col2 col3
1  Yes  Yes   No
2   No   No   No
3   No <NA> <NA>

` I've tried basic filtering operations like

dataset %>% filter(col1 == "Yes" | !is.na(.))

with other operators such as '& , |' but with no luck and I'm not sure how to apply across or filter_if here to make it work. I recognize this is very similar to https://stackoverflow.com/questions/43938863/dplyr-filter-with-condition-on-multiple-columns, but different enough to warrant asking this question again.

What am I missing here?

Allan Cameron · Accepted Answer · 2022-12-15T12:16:29.620

3

Your logic is encapsulated with:

dataset %>%
  filter(!(is.na(col1) | (col1 == "Yes" & (is.na(col2) | is.na(col3)))))
#>   col1 col2 col3
#> 1  Yes  Yes   No
#> 2   No   No   No
#> 3   No <NA> <NA>

We can rewrite this with indentations and comments to make the logic clearer:

dataset %>%
  filter(!(                       # Remove any of the following cases:
      is.na(col1)                       # Column 1 is missing
      |                               # OR 
      (col1 == "Yes"                    # col1 is yes               
       &                                # AND
      (is.na(col2) | is.na(col3))       # Either col2 OR col3 are missing
      )
 ))
#>   col1 col2 col3
#> 1  Yes  Yes   No
#> 2   No   No   No
#> 3   No <NA> <NA>

edited Dec 15 '22 at 12:16

answered Dec 15 '22 at 12:07

Allan Cameron

147,086
7
49
87

This works great! Thank you. Answer accepted. Though I wonder if there is a more elegant solution for situations involving more than 3 columns... – SahirAdv Dec 15 '22 at 12:21
1

@SahirAdv You could have `filter(!(is.na(col1) | (col1 == 'Yes' & if_any(col2:col3, is.na))))` – Allan Cameron Dec 15 '22 at 12:24

score 2 · Answer 2 · answered Dec 15 '22 at 12:23

2

You can use if_any to deal with the second filtering condition:

dataset %>% 
  filter(complete.cases(col1), 
         !(col1 == "Yes" & if_any(-col1, is.na)))

  col1 col2 col3
1  Yes  Yes   No
2   No   No   No
3   No <NA> <NA>

answered Dec 15 '22 at 12:23

Maël

45,206
3
29
67

dplyr filter based on conditions across and within column

2 Answers2