0
ratio[i]

[1] 0.9

length(sample(c(1,2,3,4,5,6,7,8,9),2000*ratio[i],replace=T))

[1] 1800

length(sample(c(1,2,3,4,5,6,7,8,9),2000*(1-ratio[i]),replace=T))

[1] 199

It looks like R is doing the calculation incorrectly. I tried a few more number, sometimes it is correct, but sometimes it is not. So I did the following.

space<-matrix(nrow=10000,ncol=2)
for (i in 1:10000){
#expected
  space[i,1]<-20000*(1-i/10000)
#actual
  space[i,2]<-length(sample(1,20000*(1-i/10000),replace=T))

}

plot(space[,1]-space[,2])

It appears that this problem is not limited to a few numbers.

Bioinfo
  • 3
  • 1

2 Answers2

1

This is because of the imprecision of floating point operations. 2000*(1-ratio[i]) does not give exactly 200 as you can see if you do this:

options(digits=22)
2000*(1-ratio[i])

[1] 199.9999999999999431566

You get the same result if you do 2000 * (1 - 0.9).

sample uses the floor of size parameter and floor of 199.9999...566 is 199. You can wrap it in round() to make sure you get the sample size you were expecting.

eipi10
  • 91,525
  • 24
  • 209
  • 285
  • agreed. Although is it not a bug in `sample` to use `floor` instead of `round` ? – RockScience Mar 23 '15 at 04:51
  • Well, `sample` and (for example) `seq` and `rep` all use only the integer part of the size argument. Whether that's a bug or a feature is above my pay grade. – eipi10 Mar 23 '15 at 04:56
0

It seems that 'sample' is 'flooring' argument 'size' when a double is passed.

Please ensure that you are passing an integer in argument 'size'

length(sample(1:9,size=2000*(1-0.9),replace=TRUE)) # length is 199  

but

length(sample(1:9,size=round(2000*(1-0.9)),replace=TRUE)) # length is 200
RockScience
  • 17,932
  • 26
  • 89
  • 125