why sometimes R can't tell difference between NA and 0?

Question

I am trying to extract rows of data with field "var" equals 0.

But I found "NA" were taken as 0:

There are 20 rows of 0 and 809 rows of "NA".

There are total 81291 rows in data frame d.

> length(d$var[d$var == "0"])
[1] 829

> length(d$var[d$var == 0])
[1] 829

The above 829 values include both 0 and "NA"

> length(d$var[d$var == "NA"])
[1] 809

> length(d$var[d$var == NA])
[1] 81291

Why does the above code gave the length of d?

Read this: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — G. Grothendieck, Nov 30 '13 at 14:46
NO. NA's are not taken as 0. What is happening is that "[" returns an NA whenever the index is NA. I consider it a pain in the der·ri·ère, but it is considered a feature by the R core. Also: `NA != 'NA'` (No value, even NA, equals NA.) Use `subset`? — IRTFM, Nov 30 '13 at 16:16
@DWin I think the subset point is worth an expanded answer. I haven't investigated it's behaviour myself- so I'm interested too. — Stephen Henderson, Nov 30 '13 at 18:30

score 2 · Answer 1 · answered Nov 30 '13 at 14:40

2

x == NA is not the way to test whether the value of some variable x is NA. Use is.na()instead:

> 2 == NA
[1] NA
> is.na(2)
[1] FALSE

Similarly, use is.null() to test whether an object is the NULL object.

answered Nov 30 '13 at 14:40

Stéphane Laurent

75,186
15
119
225

1

That answers my last question Thanks! the main problem is that I am trying to evaluate if x equals 0. but NA is also counted – user2783615 Nov 30 '13 at 14:43

Drew Steen · Answer 2 · 2013-11-30T15:01:16.710

One way to evaluate this is the inelegant

length(d$var[(d$var == 0) & (!is.na(d$var))])

(or slightly more compactly, sum(d$var==0 & !is.na(d$var)))

I think your code illustrates some misunderstandings you are having about R syntax. Let's make a compact, reproducible example to illustrate:

d <- data.frame(var=c(7, 0, NA, 0))

As you point out, length(d$var[d$var==0]) will return 3, because NA==0 is evaluated as NA.

When you enclose the value you're looking for in quotation marks, R evaluates it as a string. So length(d$var[d$var == "NA"]) is asking how many elements in d$var are the character string "NA". Since there are no characters "NA" in your data set, you get back the number of values that evaluate to NA (because "NA"==NA evaluates to NA).

In order to answer your last question, look at what d$var[d$var==NA] returns: a vector of NA of the same length as your original vector. Again, any == comparison with NA evaluates to NA. Since all of the comparisons in that expression are to NA, you'll get back a vector of NAs that is the same length as your original vector.

Thanks That does the work. but why were NA and 0 taken as the same value? — user2783615, Nov 30 '13 at 14:51
My answer is a little wordy - I think you'll understand best by looking at the actual vectors that you are measuring the length of. (Best to use a small example like the one I provide). — Drew Steen, Nov 30 '13 at 15:02

sidquanto · Answer 3 · 2013-11-30T14:58:13.877

1

Here is the solution that gives the right answer.

length(which(d$var == 0))

the reason you are facing that problem is that in your expression, the condition check does not give FALSE for the NA values, it gives NA instead and when you add the condition as the index, the values which are not FALSE are checked for. in the expression i have given, it checks for which conditions are TRUE and hence you get the right answer.

edited Nov 30 '13 at 14:58

answered Nov 30 '13 at 14:50

sidquanto

315
3
6

Thanks! What if I want to extract the rows that have d$var not being 0? which(d$var != 0) won't work in this case – user2783615 Nov 30 '13 at 15:00
do you want the NA to be counted or not in the case of not being 0 ? – sidquanto Nov 30 '13 at 15:09
I don't want the NA to be counted as being 0. – user2783615 Nov 30 '13 at 15:46
the NA will not be counted either in the case of equal to 0 or in the case of not equal to 0. so you can just use "equals" or "not equals" and it will give the number without the number of NA's. is this not sufficient for you ? – sidquanto Nov 30 '13 at 15:55
I'm curious. Why won't `which(d$var != 0)` give you what you expected? – IRTFM Nov 30 '13 at 16:22

why sometimes R can't tell difference between NA and 0?

3 Answers3