subset using `[`, explain NA output

Question

If we have his data recentely used here:

data <- data.frame(name = rep(letters[1:3], each = 3), 
                   var1 = rep(1:9), var2 = rep(3:5, each = 3))

  name var1 var2
1    a    1    3
2    a    2    3
3    a    3    3
4    b    4    4
5    b    5    4
6    b    6    4
7    c    7    5
8    c    8    5
9    c    9    5

we can look for rows where var2 == 4.

data[data[,3] == 4 ,] # equally data[data$var2 == 4 ,]

#  name var1 var2
#4    b    4    4
#5    b    5    4
#6    b    6    4

or rows where both var1 and var2 ==4

data[data[,2] == 4 &  data[,3] == 4,]

#  name var1 var2
#4    b    4    4

what I dont get is why this:

data[ data[ , 2:3 ] == 4 ,]

gives this:

     name var1 var2
4       b    4    4
NA   <NA>   NA   NA
NA.1 <NA>   NA   NA
NA.2 <NA>   NA   NA

#I would still hope to get 
 #  name var1 var2
#4    b    4    4

Where do the NAs come from?

I think thats a rough downvote. – user1322296 Feb 06 '13 at 21:36 — user1322296, Feb 06 '13 at 21:36

Ari B. Friedman · Answer 1 · 2013-02-06T21:32:50.553

Your logical that you're subsetting on is a matrix:

> sel <- data[ , 2:3 ] == 4
> sel
       var1  var2
 [1,] FALSE FALSE
 [2,] FALSE FALSE
 [3,] FALSE FALSE
 [4,]  TRUE  TRUE
 [5,] FALSE  TRUE
 [6,] FALSE  TRUE
 [7,] FALSE FALSE
 [8,] FALSE FALSE
 [9,] FALSE FALSE

According to help("[.data.frame"):

Matrix indexing (x[i] with a logical or a 2-column integer matrix i) using [ is not recommended, and barely supported. For extraction, x is first coerced to a matrix. For replacement, a logical matrix (only) can be used to select the elements to be replaced in the same way as for a matrix.

But that implies this form:

> data[ sel ]
[1] "b" "4" "5" "6" "4"

Badness. What you're doing is even less sensical, though, in that you're telling it you want only the rows (with your trailing comma), and then giving it a matrix to index on!

> data[sel,]
     name var1 var2
4       b    4    4
NA   <NA>   NA   NA
NA.1 <NA>   NA   NA
NA.2 <NA>   NA   NA

If you really wanted to use the matrix form, you could use apply to apply a logical operation across rows.

+1 thanks for the clarification, I knew it was the wrong way I just didn't know why. Also I know help but hope you realise `help("[.data.frame")` might be a bit obscure to the uninitiated. — user1322296, Feb 06 '13 at 21:34

score 2 · Accepted Answer · answered Feb 06 '13 at 21:30

Your data[,2:3]==4 is the following :

R> data[,2:3]==4
       var1  var2
 [1,] FALSE FALSE
 [2,] FALSE FALSE
 [3,] FALSE FALSE
 [4,]  TRUE  TRUE
 [5,] FALSE  TRUE
 [6,] FALSE  TRUE
 [7,] FALSE FALSE
 [8,] FALSE FALSE
 [9,] FALSE FALSE

Then you try to index the rows of your data frame with this matrix. To do this, R seems to first convert your matrix to a vector :

R> as.vector(data[,2:3]==4)
 [1] FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[12] FALSE  TRUE  TRUE  TRUE FALSE FALSE FALSE

It then selects the rows of data based on this vector. The 4th TRUE value selects the 4th row, but the three others TRUE values select "out of bounds" rows, so they return NA's.

score 0 · Answer 3 · answered Feb 06 '13 at 21:27

0

    data[ data[ , 2 ] == 4 | data[,3] == 4,]

    name  var1 var2
 4    b    4    4
 5    b    5    4
 6    b    6    4

I suspect your method does not work because c() builds a vector, whereas you need to compare the atomic elements.

answered Feb 06 '13 at 21:27

hd1

33,938
5
80
91

score 0 · Answer 4 · answered Feb 06 '13 at 21:27

Because you're not passing a vector but a matrix to the index:

> data[ , 2:3 ] == 4
       var1  var2
 [1,] FALSE FALSE
 [2,] FALSE FALSE
 [3,] FALSE FALSE
 [4,]  TRUE  TRUE
 [5,] FALSE  TRUE
 [6,] FALSE  TRUE
 [7,] FALSE FALSE
 [8,] FALSE FALSE
 [9,] FALSE FALSE

If you want the matrix collapsed into a vector that indexing works with here are two options:

data[ apply(data[ , 2:3 ] == 4, 1, all) ,]
data[ rowSums(data[ , 2:3 ] == 4) == 2 ,]

subset using `[`, explain NA output

4 Answers4