1

This question is similar(but not the same!) as the following questions...

Different sample results using set.seed command?

Is set.seed consistent over different versions of R (and Ubuntu)?

Same seed, different OS, different random numbers in R

... in which RNGkind() is recommended in scripts to guarantee consistency between OS / R versions when setting the seed with set.seed()

However, I have found that in order to reproduce results on the unix and windows systems I'm using, I have to set RNGkind(sample.kind = "Rounding") when running on Windows but not on unix. If I set it on both, I can't reproduce the result.

Can anyone explain this discrepancy in the systems? And how does one share code with set.seed() and ensure it's reproducible without knowing the end users' OS?

Many thanks

EDIT: I am having this problem using the kmeans() function. I set.seed(1) prior to each use of kmeans()

Jatson
  • 43
  • 6
  • 1
    One doesn't. Reproducibility can't be ensured reliably between different OSs. – Roland Oct 11 '20 at 10:33
  • 1
    If it is critical you can consider hard-coding the random numbers your system generates as an external data file (if large) or a `dput` if small – Allan Cameron Oct 11 '20 at 11:41
  • 1
    You need to show us the code that you are finding inconsistent. The `sample()` function (which `sample.kind` affects) will give identical results if the seed and `sample.kind` match. Other things (e.g. arithmetic!) may vary between systems, but `sample()` won't. – user2554330 Oct 11 '20 at 12:16
  • @Roland, your comment is incorrect. The RNGs in R are *very good* at ensuring reproducibility across OSs. There are subtle differences in arithmetic (64 bit precision vs 80 bit precision, some math libraries), but not the RNGs, which are mostly independent of that. There's some detail in my answer as to how people can end up confused about this. – user2554330 Oct 11 '20 at 16:46
  • @user2554330 Thanks for your comment, I'd added more info to the question. I am having this problem with the `kmeans()` function. – Jatson Oct 12 '20 at 09:25
  • Could you show us an example that gives different results on different systems? Then I'd vote to re-open. – user2554330 Oct 12 '20 at 09:48

1 Answers1

2

The random number generators in R are consistent across operating systems, but have been modified a few times over the history of R, so are not consistent by default across R versions. However, you can always reproduce the random streams from earlier R versions by setting set.seed() and RNGkind() to match what was previously used.

The RNGversion() function will set newer versions of R to the defaults from any previous version. If you look at its source, you can see that the defaults changed in 0.99, 1.7.0, and 3.6.0.

One difficulty in reproducing random number results is that people don't always report the value of RNGkind(). If you change to a non-default setting and save the workspace, you'll return to that non-default setting when you reload it.

Generally speaking, each of the changes has been an improvement, so advice to use code like RNGkind(sample.kind = "Rounding") is probably bad advice: it restores buggy behaviour that was fixed by default in R 3.6.0. (Though it's a pretty subtle bug unless you're using the sample() function with really huge populations.)

You are generally better off encouraging people to use the most recent release of R (except occasionally x.y.0 releases, which sometimes introduce new bugs). It's also a bad idea to save the workspace, because that will cause R to retain the old or non-default RNGs.

user2554330
  • 37,248
  • 4
  • 43
  • 90