0

I'm following the swirl tutorial, and one of the parts has a vector x defined as:

> x
 [1]  1.91177824  0.93941777 -0.72325856  0.26998371          NA          NA
 [7] -0.17709161          NA          NA  1.98079386 -1.97167684 -0.32590760
[13]  0.23359408 -0.19229380          NA          NA  1.21102697          NA
[19]  0.78323515          NA  0.07512655          NA  0.39457671  0.64705874
[25]          NA  0.70421548 -0.59875008          NA  1.75842059          NA
[31]          NA          NA          NA          NA          NA          NA
[37] -0.74265585          NA -0.57353603          NA

Then when we type x[is.na(x)] we get a vector of all NA's

> x[is.na(x)]
 [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA

Why does this happen? My confusion is that is.na(x) itself returns a vector of length 40 with True or False in each entry of the vector depending on whether that entry is NA or not. Why does "wrapping" this vector with x[ ] suddenly subset to the NA's themselves?

smci
  • 32,567
  • 20
  • 113
  • 146
Apollo
  • 8,874
  • 32
  • 104
  • 192
  • 2
    When you index a vector by a logical vector, it returns the vector's elements where the index was `TRUE`. You can play with this yourself -- do `x <- c(1, 2, 3)`, and then do for instance `x[c(T, F, T)]`, `x[c(F, F, F)]`, etc. – josliber Oct 17 '15 at 00:41
  • The `[]` operator selects a subset of `x`. For example, `x[1:4]` returns the first four elements of `x`. When passed a logical vector, it instead returns all the elements of `x` for which the vector is `TRUE`. So `x[is.na(x)]` returns all the elements of `x` that are `NA`. Instead, `x[!is.na(x)]` would return all the non-`NA` elements of `x`. – Sean Hughes Oct 17 '15 at 00:43

1 Answers1

3

This is called logical indexing. It's a very common and neat R idiom.

Yes, is.na(x) gives a boolean ("logical") vector of same length as your vector.

Using that logical vector for indexing is called logical indexing.

Obviously x[is.na(x)] accesses the vector of all NA entries in x, and is totally pointless unless you intend to reassign them to some other value, e.g. impute the median (or anything else)

 x[is.na(x)] <- median(x, na.rm=T)

Notes:

  • whereas x[!is.na(x)] accesses all non-NA entries in x
  • or compare also to the na.omit(x) function, which is way more clunky
  • The way R's builtin functions historically do (or don't) handle NAs (by default or customizably) is a patchwork-quilt mess, that's why the x[is.na(x)] idiom is so crucial)
  • many useful functions (mean, median, sum, sd, cor) are NA-aware, i.e. they support an na.rm=TRUE option to ignore NA values. See here. Also for how to define table_, mode_, clamp_
smci
  • 32,567
  • 20
  • 113
  • 146