Subsetting data frames in R

Question

I'm new to R and learning about subsetting. I have a table and I'm trying to get the size of a subset of the table. My issue is that when I try two different ways I get two different answers. For a table "dat" where I'm trying to select all rows where RMS is 5 and BDS is 2:

dim(dat[(dat$RMS==5) & (dat$BDS==2),])

gives me a different answer than

dim(subset(dat,(dat$RMS==5) & (dat$BDS==2)))

The second one is correct, could someone explain why these are different and why the first one is giving me the wrong answer?

Thanks

no need to use dat$ inside subset `dim(subset(dat, RMS==5 & BDS==2))` but I think even with , you should get the same result. — agstudy, Feb 01 '13 at 04:18
You would help us to help by providing some of your data, for example with `dput( head( dat, 20 ) )` or so — vaettchen, Feb 01 '13 at 04:48

topchef · Accepted Answer · 2013-02-01T05:31:05.363

5

The reason must be in different treatment of NA values by these two methods. If you remove rows with NA from the data frame you should get the same results:

dat_clean = na.omit(dat)

edited Feb 01 '13 at 05:31

answered Feb 01 '13 at 04:52

topchef

19,091
9
63
102

score 2 · Answer 2 · answered Feb 01 '13 at 04:18

Works for me.....

> x = c(1,1,2,2,3,3)
> y = c(4,4,5,5,6,6)
> 
> X = data.frame(x,y)
> 
> dim(X[X$x==1 & X$y==4,])
  [1] 2 2
> 
> (X[X$x==1 & X$y==4,])
   x y
 1 1 4
 2 1 4

> dim(subset(X,(X$x==1) & (X$y==4)))
  [1] 2 2
> subset(X,(X$x==1) & (X$y==4))
  x y
1 1 4
2 1 4

Subsetting data frames in R

2 Answers2

Linked