43

I am using R to construct an agent based model with a monte carlo process. This means I got many functions that use a random engine of some kind. In order to get reproducible results, I must fix the seed. But, as far as I understand, I must set the seed before every random draw or sample. This is a real pain in the neck. Is there a way to fix the seed?

set.seed(123)
print(sample(1:10,3))
# [1] 3 8 4
print(sample(1:10,3))
# [1]  9 10  1
set.seed(123)
print(sample(1:10,3))
# [1] 3 8 4
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
Elad663
  • 783
  • 1
  • 5
  • 13
  • 4
    Why do you want them all to be fixed? Is it not enough to set the seed once at the beginning and then run your 3 operations or however many you have? – Gavin Simpson Dec 17 '13 at 02:21
  • 6
    You'll get reproducible results from a computer program if you set the seed once at the start and never touch it. You might want to set the seed within the program if, for example, you want agents that use random numbers to behave identically each time they act. In which case, make the agent set its seed. – Spacedman Dec 17 '13 at 09:17

6 Answers6

38

There are several options, depending on your exact needs. I suspect the first option, the simplest is not sufficient, but my second and third options may be more appropriate, with the third option the most automatable.

Option 1

If you know in advance that the function using/creating random numbers will always draw the same number, and you don't reorder the function calls or insert a new call in between existing ones, then all you need do is set the seed once. Indeed, you probably don't want to keep resetting the seed as you'll just keep on getting the same set of random numbers for each function call.

For example:

> set.seed(1)
> sample(10)
 [1]  3  4  5  7  2  8  9  6 10  1
> sample(10)
 [1]  3  2  6 10  5  7  8  4  1  9
> 
> ## second time round
> set.seed(1)
> sample(10)
 [1]  3  4  5  7  2  8  9  6 10  1
> sample(10)
 [1]  3  2  6 10  5  7  8  4  1  9

Option 2

If you really want to make sure that a function uses the same seed and you only want to set it once, pass the seed as an argument:

foo <- function(...., seed) {
  ## set the seed
  if (!missing(seed)) 
    set.seed(seed) 
  ## do other stuff
  ....
}

my.seed <- 42
bar <- foo(...., seed = my.seed)
fbar <- foo(...., seed = my.seed)

(where .... means other args to your function; this is pseudo code).

Option 3

If you want to automate this even more, then you could abuse the options mechanism, which is fine if you are just doing this in a script (for a package you should use your own options object). Then your function can look for this option. E.g.

foo <- function() {
  if (!is.null(seed <- getOption("myseed")))
    set.seed(seed)
  sample(10)
}

Then in use we have:

> getOption("myseed")
NULL
> foo()
 [1]  1  2  9  4  8  7 10  6  3  5
> foo()
 [1]  6  2  3  5  7  8  1  4 10  9
> options(myseed = 42)
> foo()
 [1] 10  9  3  6  4  8  5  1  2  7
> foo()
 [1] 10  9  3  6  4  8  5  1  2  7
> foo()
 [1] 10  9  3  6  4  8  5  1  2  7
> foo()
 [1] 10  9  3  6  4  8  5  1  2  7
Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • Option 3 is helpful for my use case, but unfortunately does not seem to work as expected for me. When I run `options(myseed = 42)` that works fine, and I can see the change happened by looking at `getOption("myseed")`. However, running the sample function the results don't seem to be consistent and I'm seeing them change each line compared to the expected output shown for Option 3. – Ricky Aug 28 '20 at 16:37
  • Option 3 is working for me; you can't just call `sample()` after setting the option. As shown in `foo()` you need a wrapper that checks if the option is set and if it is sets the seed to the value stored in the option and then calls `sample()`. – Gavin Simpson Aug 28 '20 at 16:43
  • I see what you mean, sorry I misunderstood that. The code by user @stevec below is what I was looking for where you don't need to wrap a function inside a different function to have the seed be persistent: `addTaskCallback(function(...) {set.seed(123);TRUE})` – Ricky Aug 28 '20 at 16:59
33

I think this question suffers from a confusion. In the example, the seed has been set for the entire session. However, this does not mean it will produce the same set of numbers every time you use the print(sample)) command during a run; that would not resemble a random process, as it would be entirely determinate that the same three numbers would appear every time. Instead, what actually happens is that once you have set the seed, every time you run a script the same seed is used to produce a pseudo-random selection of numbers, that is, numbers that look as if they are random but are in fact produced by a reproducible process using the seed you have set.

If you rerun the entire script from the beginning, you reproduce those numbers that look random but are not. So, in the example, the second time that the seed is set to 123, the output is again 9, 10, and 1 which is exactly what you'd expect to see because the process is starting again from the beginning. If you were to continue to reproduce your first run by writing print(sample(1:10,3)), then the second set of output would again be 3, 8, and 4.

So the short answer to the question is: if you want to set a seed to create a reproducible process then do what you have done and set the seed once; however, you should not set the seed before every random draw because that will start the pseudo-random process again from the beginning.

This question is old, but still comes high in search results, and it seemed worth expanding on Spacedman's answer.

TilmanHartley
  • 439
  • 4
  • 2
  • People don't only want reproducible scripts. Reproducible functions are important too, especially in modern-day notebook workflows. You answered the title but not the body of the question. – jiggunjer Jul 22 '20 at 08:17
12

If you want to always return the same results from random processes, simply keep the seed set all the time with:

addTaskCallback(function(...) {set.seed(123);TRUE})

Now the output is the same every time:

print(sample(1:10,3))
# [1] 3 8 4
print(sample(1:10,3))
# [1] 3 8 4
stevec
  • 41,291
  • 27
  • 223
  • 311
  • 1
    +1 this is the simplest solution for a monte carlo etc. Is it possible to turn this off (without restarting)? – user63230 Sep 11 '20 at 08:10
  • 1
    @user63230 thanks! Yes, try `removeTaskCallback(1)`. What that does is removed the first callback (so if you happen to have >1, you should change the argument to match the one set with `addTaskCallback()`). – stevec Sep 11 '20 at 09:14
3

You could do a wrapper function, like so:

> wrap.3.digit.sample <- function(x) {
+    set.seed(123)
+    return(sample(x, 3))
+ }
> wrap.3.digit.sample(c(1:10))
[1] 3 8 4
> wrap.3.digit.sample(c(1:10))
[1] 3 8 4

There is probably a more elegant way, and I'm sure someone will chime in with it. But, if they don't, this should make your life easier.

hd1
  • 33,938
  • 5
  • 80
  • 91
1

No need. Although the results are different from sample to sample (which you almost certainly want, otherwise the randomness is very questionable), results from run to run will be the same. See, here's the output from my machine.

> set.seed(123)
> sample(1:10,3)
[1] 3 8 4
> sample(1:10,3)
[1]  9 10  1
Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142
  • 3
    I don't think what you said gets the point from OP. I think the emphasis is "reproducible" result. You can get different samples but you have no control of it in your approach – alittleboy Dec 17 '13 at 02:26
  • 1
    @alittleboy Well, it is decidedly unclear what the OP needs. For example, if I have a script that calls function `foo()` 4 times and each one will make use of 10 random numbers, then all I need to do is set the seed once at the start of the script. The the results are reproducible if you run the script. If you need to pull separate runs out and do them independently, then you need another approach. Your answer is one and is very specific and assumes a lot unsaid by the OP. For example, I do a lot of Monte Carlo stuff and I wouldn't do any of it your way. So context is everything here. – Gavin Simpson Dec 17 '13 at 02:36
  • @GavinSimpson The problem is that the length is dynamic, as the agent model is dynamic. Then, because the structure is not fixed, I can get different results for the same seed. I think. – Elad663 Dec 17 '13 at 02:49
  • @Elad663 Then that is useful information to put into your question. In that case you might want to follow options 2 or 3 in my Answer. But I'm not sure you really want to do this; does it matter that each call of the function will use exactly the same initial set of random numbers, until the dynamic bit kicks in? – Gavin Simpson Dec 17 '13 at 03:04
  • @GavinSimpson You are right, should have been a part of the question. From the discussion here I become more doubtful about the correct implementation. Let's see what my adviser say.. thank you for your time and effort. – Elad663 Dec 17 '13 at 04:00
0

I suggest that you set.seed before calling each random number generator in R. I think what you need is reproducibility for Monte Carlo simulations. If in a for loop, you can set.seed(i) before calling sample, which guarantees to be fully reproducible. In your outer function, you may specify an argument seed=1 so that in the for loop, you use set.seed(i+seed).

alittleboy
  • 10,616
  • 23
  • 67
  • 107