0

I have the following data frame:

a  b  c
x  1  1
x  1  NA
y  NA 1
y  1  1

I would like to remove the rows containing at least an NA in any column(s), but only if the "a" column contains a "y". So, the result would be:

a  b  c
x  1  1
x  1  NA
y  1  1

So far I have tried this:

my_DF %>%
  filter(!(any(is.na(.)) & a == "y"))

but the resulting data frame is the following:

a  b  c
x  1  1
x  1  NA

so this just removes any row in which "a" contains a "y", regardless of whether the row also contains NAs in at least one column.

How could I change the "any(is.na(.))" part of the command (I guess that is the wrong part) in other for it to work?

Miguel
  • 356
  • 1
  • 15

4 Answers4

2

You can use the new if_any approach, introduced in dplyr 1.0.4, and used within the filter function. The following code will achieve the result you're after:

my_DF %>% 
filter(!(a == "y" & if_any(everything(), ~ is.na(.x))))

Explanation of individual bits

filter - keep all rows where

! - it's not true that

everything() - check all columns (alternatively, you could specify a vector of column names, e.g. c("b", "c"))

if_any(everything(), ~ is.na(.x)) - if any column has NA (there is also an if_all version)

Full reproducible example

my_DF <- data.frame(a = c("x", "x", "y", "y"),
                    b = c(1, 1, NA, 1),
                    c = c(1, NA, 1, 1))
my_DF %>% 
filter(!(a == "y" & if_any(everything(), ~ is.na(.x))))
kasia_b
  • 226
  • 2
  • 7
1

You can do:

my_DF <- read.table(header=TRUE, text=
"a  b  c
x  1  1
x  1  NA
y  NA 1
y  1  1")
i <- apply(is.na(my_DF), 1, any) & my_DF$a=="y"
my_DF[!i,]
jogo
  • 12,469
  • 11
  • 37
  • 42
0

You can do it in data.table

library(data.table)
setDT(my_df)
my_df <- my_df[!complete_cases(my_df),][a == 'y',]

This thread answers your question I guess - How to filter rows out of data.table where any column is NA without specifying columns individually

Azat
  • 75
  • 6
0

In base R you may do

my_DF <- read.table(header=TRUE, text=
                      "a  b  c
x  1  1
x  1  NA
y  NA 1
y  1  1")

my_DF[rowSums(is.na(my_DF)) == 0 | my_DF$a == 'x',]
#>   a b  c
#> 1 x 1  1
#> 2 x 1 NA
#> 4 y 1  1

Created on 2021-06-29 by the reprex package (v2.0.0)

AnilGoyal
  • 25,297
  • 4
  • 27
  • 45