I have a data frame df
with variable x
. However, two different expression to check on NA
give me different results. Can anyone explain?
sum(is.na(df$x)
#[1] 41
df %>% filter(x==NA)
#A tibble: 0 x 1`
I have a data frame df
with variable x
. However, two different expression to check on NA
give me different results. Can anyone explain?
sum(is.na(df$x)
#[1] 41
df %>% filter(x==NA)
#A tibble: 0 x 1`
Note that a comparison with NA
via ==
(nearly) always evaluates to NA
. This is easily demonstrated with:
x <- c(1, 2, NA, 4)
x == NA
#[1] NA NA NA NA
See help("NA")
and help("==")
. From the latter documentation:
Missing values (
NA
) andNaN
values are regarded as non-comparable even to themselves, so comparisons involving them will always result inNA
.
So your dplyr
code should be:
df %>% filter(is.na(x))