Remove rows that contain at least an NA only if one column contains a specific value

Question

I have the following data frame:

a  b  c
x  1  1
x  1  NA
y  NA 1
y  1  1

I would like to remove the rows containing at least an NA in any column(s), but only if the "a" column contains a "y". So, the result would be:

a  b  c
x  1  1
x  1  NA
y  1  1

So far I have tried this:

my_DF %>%
  filter(!(any(is.na(.)) & a == "y"))

but the resulting data frame is the following:

a  b  c
x  1  1
x  1  NA

so this just removes any row in which "a" contains a "y", regardless of whether the row also contains NAs in at least one column.

How could I change the "any(is.na(.))" part of the command (I guess that is the wrong part) in other for it to work?

score 2 · Accepted Answer · answered Jun 29 '21 at 13:41

You can use the new if_any approach, introduced in dplyr 1.0.4, and used within the filter function. The following code will achieve the result you're after:

my_DF %>% 
filter(!(a == "y" & if_any(everything(), ~ is.na(.x))))

Explanation of individual bits

filter - keep all rows where

! - it's not true that

everything() - check all columns (alternatively, you could specify a vector of column names, e.g. c("b", "c"))

if_any(everything(), ~ is.na(.x)) - if any column has NA (there is also an if_all version)

Full reproducible example

my_DF <- data.frame(a = c("x", "x", "y", "y"),
                    b = c(1, 1, NA, 1),
                    c = c(1, NA, 1, 1))
my_DF %>% 
filter(!(a == "y" & if_any(everything(), ~ is.na(.x))))

I like more this approach (using dplyr) – Miguel Jun 30 '21 at 14:39 — Miguel, Jun 30 '21 at 14:39

score 1 · Answer 2 · answered Nov 23 '20 at 13:35

1

You can do:

my_DF <- read.table(header=TRUE, text=
"a  b  c
x  1  1
x  1  NA
y  NA 1
y  1  1")
i <- apply(is.na(my_DF), 1, any) & my_DF$a=="y"
my_DF[!i,]

answered Nov 23 '20 at 13:35

jogo

12,469
11
37
42

score 0 · Answer 3 · answered Nov 23 '20 at 13:40

You can do it in data.table

library(data.table)
setDT(my_df)
my_df <- my_df[!complete_cases(my_df),][a == 'y',]

This thread answers your question I guess - How to filter rows out of data.table where any column is NA without specifying columns individually

score 0 · Answer 4 · answered Jun 29 '21 at 13:45

In base R you may do

my_DF <- read.table(header=TRUE, text=
                      "a  b  c
x  1  1
x  1  NA
y  NA 1
y  1  1")

my_DF[rowSums(is.na(my_DF)) == 0 | my_DF$a == 'x',]
#>   a b  c
#> 1 x 1  1
#> 2 x 1 NA
#> 4 y 1  1

^{Created on 2021-06-29 by the reprex package (v2.0.0)}

Remove rows that contain at least an NA only if one column contains a specific value

4 Answers4