What is difference between replicate n times and generate n directly in sampling of R?

Question

I am asked to "simulate x as an independent identically distributed (iid) normal variable with mean=0, std=1.5 with sample length 500"

I am doing the sampling in following two ways:

set.seed(8402)
X <- rnorm(500, 0, 1.5)
head(X)

and I got

-1.8297969 -0.1862884 1.4219400 -1.0841421 -1.5276701 1.6159368

However, if I do

X <- replicate(500, rnorm(1,0,1.5))
head(X)

and I got

-0.04032755 0.92002552 -2.28001943 -1.36840869 1.49820718 0.06205003

My question is what is the right way to generate iid normal variable? What is the difference between those two ways?

Many thanks!

Running your code I get the same result with both ways (as expected). I get your second result if I don't set the random seed the second time. The first way is more efficient and therefore recommended. — Roland, Apr 05 '16 at 14:00
@Roland Could you explain what do you mean by "set the random seed the second time"? — xxks-kkk, Apr 05 '16 at 14:04
If you run your code exactly as provided in your question, you don't get the output you show. — Roland, Apr 05 '16 at 14:08
Computers aren't so good at random numbers, in the sense that they don't *actually* generate random numbers. To test this, set a random seed like `set.seed(42)` and then run `rnorm(1)` 5 times (not using replicate). They'll appear like random numbers. Now, run `set.seed(42)` again and run `rnorm(1)` 5 more times. You'll notice that they're the exact same numbers you got before. By running `set.seed` a second time, you "restart" the list of numbers that R is pulling from. That's why your results are different if you only use `set.seed` once. — tblznbits, Apr 05 '16 at 14:23

score 2 · Accepted Answer · edited May 23 '17 at 11:52

R Internal

Internally in R, the C function from <Rmath.h>: double rnorm (double mean, double sd) function generates one random number at a time. When you call its R wrapper function rnorm(n, mean, sd), it calls the C level function n times.

This is as same as you call R level function only once with n = 1, but replicate it n times using replicate.

The first method is much faster (possibly the difference will be seen when n is really large), as everything is done at C level. replicate however, is a wrapper of sapply, so it is not really a vectorized function (read on Is the "*apply" family really not vectorized?).

In addition, if you set the same random seed for both, you are going to get the same set of random numbers.

A more illustrative experiment

In my comment below, I say that random seed is only set once on entry. To help people understand this, I provide this example. There is no need to use large n. n = 4 is sufficient.

First, let's set seed at 0, while generating 4 standard normal samples:

set.seed(0); rnorm(4, 0, 1)
## we get
[1]  1.2629543 -0.3262334  1.3297993  1.2724293

Note that in this case, all 4 numbers are obtained from the entry seed 0.

Now, let's do this:

set.seed(0)
rnorm(2, 0, 1)
## we get
[1]  1.2629543 -0.3262334
## do not reset seed, but continue with the previous seed
replicate(2, rnorm(1, 0, 1))
## we get
[1] 1.329799 1.272429

See?

But if we reset seed in the middle, for example, set it back to 0

set.seed(0)
rnorm(2, 0, 1)
## we get
[1]  1.2629543 -0.3262334
## reset seed
set.seed(0)
replicate(2, rnorm(1, 0, 1))
## we get
[1] 1.2629543 -0.3262334

This is what I mean by "entry".

What is difference between replicate n times and generate n directly in sampling of R?

1 Answers1

Linked

Related