65

My data looks like this:

library(tidyverse)

df <- tribble(
    ~a, ~b, ~c,
    1, 2, 3, 
    1, NA, 3, 
    NA, 2, 3
)

I can remove all NA observations with drop_na():

df %>% drop_na()

Or remove all NA observations in a single column (a for example):

df %>% drop_na(a)

Why can't I just use a regular != filter pipe?

df %>% filter(a != NA)

Why do we have to use a special function from tidyr to remove NAs?

Braiam
  • 1
  • 11
  • 47
  • 78
emehex
  • 9,874
  • 10
  • 54
  • 100

5 Answers5

88

For example:

you can use:

df %>% filter(!is.na(a))

to remove the NA in column a.

Petter Friberg
  • 21,252
  • 9
  • 60
  • 109
JeffZheng
  • 1,277
  • 1
  • 10
  • 13
45

If someone is here in 2020, after making all the pipes, if u pipe %>% na.exclude will take away all the NAs in the pipe!

shacke
  • 592
  • 4
  • 11
38

From @Ben Bolker:

[T]his has nothing specifically to do with dplyr::filter()

From @Marat Talipov:

[A]ny comparison with NA, including NA==NA, will return NA

From a related answer by @farnsy:

The == operator does not treat NA's as you would expect it to.

Think of NA as meaning "I don't know what's there". The correct answer to 3 > NA is obviously NA because we don't know if the missing value is larger than 3 or not. Well, it's the same for NA == NA. They are both missing values but the true values could be quite different, so the correct answer is "I don't know."

R doesn't know what you are doing in your analysis, so instead of potentially introducing bugs that would later end up being published an embarrassing you, it doesn't allow comparison operators to think NA is a value.

Community
  • 1
  • 1
emehex
  • 9,874
  • 10
  • 54
  • 100
1

I always use this and it is working perfectly

cool$day[cool$day==''] <- NA  
cool$day[is.na(cool$day)] <- "NA"

cool <- cool[!cool$day == "NA", ]
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Anya Sti
  • 131
  • 2
  • 5
0

Another option could be using complete.cases in your filter to for example remove the NA in the column A. Here is some reproducible code:

library(dplyr)
df %>%
  filter(complete.cases(a))
#> # A tibble: 2 × 3
#>       a     b     c
#>   <dbl> <dbl> <dbl>
#> 1     1     2     3
#> 2     1    NA     3

Created on 2023-03-26 with reprex v2.0.2

Quinten
  • 35,235
  • 5
  • 20
  • 53