Data frame in R: interesting behavior for counting rows

Asked May 26 '18 at 16:13

Active May 26 '18 at 16:13

Viewed 30 times

A toy example:

> dfx <- data.frame(a=c("A","X","X","D","X",NA,NA),b=c(1,3,4,5,2,1,NA))
> dfx[dfx$a=="E",]
        a  b
NA   <NA> NA
NA.1 <NA> NA

Why R gives the NA rows for a value that do not exist?
It is very dangerous to do things like (when you believed that "E" exists in dfx):

> nrow(dfx[dfx$a=="E",])
[1] 2

Thanks!

asked May 26 '18 at 16:13

l0110

3

It is because of `NAs` use either `%in%` in place of `==` which gives FALSE for NA, while `==` doesn't change the NA or `dfx[dfx$a=="E" & !is.na(dfx$a),] )` – akrun May 26 '18 at 16:14
@akrun: do you mean the 'NA' rows are not removed during the '==' evaluation of the data frame? – l0110 May 26 '18 at 16:31
1

If you check the output of `==`,the NA values remain the same along with the TRUE and FALSE, those `NA` values are creating the NA rows – akrun May 26 '18 at 16:32
@akrun: thanks so much!!! I will do something like `dfx[which(dfx$a=="E"),]` (as in the possible duplicate link) in the future. – l0110 May 26 '18 at 16:38

0 Answers0