2

I'm new to R and learning about subsetting. I have a table and I'm trying to get the size of a subset of the table. My issue is that when I try two different ways I get two different answers. For a table "dat" where I'm trying to select all rows where RMS is 5 and BDS is 2:

dim(dat[(dat$RMS==5) & (dat$BDS==2),])

gives me a different answer than

dim(subset(dat,(dat$RMS==5) & (dat$BDS==2)))

The second one is correct, could someone explain why these are different and why the first one is giving me the wrong answer?

Thanks

jasonm
  • 1,020
  • 2
  • 11
  • 24
  • no need to use dat$ inside subset `dim(subset(dat, RMS==5 & BDS==2))` but I think even with , you should get the same result. – agstudy Feb 01 '13 at 04:18
  • 3
    You would help us to help by providing some of your data, for example with `dput( head( dat, 20 ) )` or so – vaettchen Feb 01 '13 at 04:48

2 Answers2

5

The reason must be in different treatment of NA values by these two methods. If you remove rows with NA from the data frame you should get the same results:

dat_clean = na.omit(dat)
topchef
  • 19,091
  • 9
  • 63
  • 102
2

Works for me.....

> x = c(1,1,2,2,3,3)
> y = c(4,4,5,5,6,6)
> 
> X = data.frame(x,y)
> 
> dim(X[X$x==1 & X$y==4,])
  [1] 2 2
> 
> (X[X$x==1 & X$y==4,])
   x y
 1 1 4
 2 1 4

> dim(subset(X,(X$x==1) & (X$y==4)))
  [1] 2 2
> subset(X,(X$x==1) & (X$y==4))
  x y
1 1 4
2 1 4