Given that the following are equivalent we might infer that R uses the same C runif
function to generate uniform samples for sample()
and runif()
...
set.seed(1)
sample(1000,10,replace=TRUE)
#[1] 27 38 58 91 21 90 95 67 63 7
set.seed(1)
ceiling( runif(10) * 1000 )
#[1] 27 38 58 91 21 90 95 67 63 7
However they are not equivalent when dealing with big numbers ( n > 2^32 - 1
):
set.seed(1)
ceiling( runif(1e1) * as.numeric(10^12) )
#[1] 265508663143 372123899637 572853363352 908207789995 201681931038 898389684968
#[7] 944675268606 660797792487 629114043899 61786270468
set.seed(1)
sample( as.numeric(10^12) , 1e1 , replace = TRUE )
#[1] 2655086629 5728533837 2016819388 9446752865 6291140337 2059745544 6870228465
#[8] 7698414177 7176185248 3800351852
Update
As @Arun points out the 1st, 3rd, 5th, ... of runif()
approximate result of 1st, 2nd, 3rd... from sample()
.
It turns out that both functions call unif_rand()
behind the scenes, however, sample
, given an argument, n
that is larger than the largest representable integer of type "integer"
but representable as an integer as type "numeric"
uses this static definition to draw a random deviate (as opposed to just unif_rand()
as in the case of runif()
)...
static R_INLINE double ru()
{
double U = 33554432.0;
return (floor(U*unif_rand()) + unif_rand())/U;
}
With a cryptic note in the docs that...
Two random numbers are used to ensure uniform sampling of large integers.
Why are two random numbers needed to ensure uniform sampling of large integers?
What is the constant
U
for and why does it take the particular value33554432.0
?