1

A toy example:

> dfx <- data.frame(a=c("A","X","X","D","X",NA,NA),b=c(1,3,4,5,2,1,NA))
> dfx[dfx$a=="E",]
        a  b
NA   <NA> NA
NA.1 <NA> NA

Why R gives the NA rows for a value that do not exist?
It is very dangerous to do things like (when you believed that "E" exists in dfx):

> nrow(dfx[dfx$a=="E",])
[1] 2

Thanks!

l0110
  • 859
  • 1
  • 7
  • 17
  • 3
    It is because of `NAs` use either `%in%` in place of `==` which gives FALSE for NA, while `==` doesn't change the NA or `dfx[dfx$a=="E" & !is.na(dfx$a),] )` – akrun May 26 '18 at 16:14
  • @akrun: do you mean the 'NA' rows are not removed during the '==' evaluation of the data frame? – l0110 May 26 '18 at 16:31
  • 1
    If you check the output of `==`,the NA values remain the same along with the TRUE and FALSE, those `NA` values are creating the NA rows – akrun May 26 '18 at 16:32
  • @akrun: thanks so much!!! I will do something like `dfx[which(dfx$a=="E"),]` (as in the possible duplicate link) in the future. – l0110 May 26 '18 at 16:38

0 Answers0