When and why are random number generators not sampled for certain functions and their inputs?

Question

In a previous question of mine I learned that a random number generator is not sampled when p = 1 or p = 0 in the rbinom() function. Are there any other functions one should be aware of where a random number generator is not sampled given certain inputs? The reason I ask is that the simulated output of certain variables is not constant between simulations after changes in parameters unrelated to these output variables despite the use of a fixed seed.

The version of R I am working with:

> R.version
               _                           
platform       x86_64-w64-mingw32          
arch           x86_64                      
os             mingw32                     
system         x86_64, mingw32             
status                                     
major          3                           
minor          6.1                         
year           2019                        
month          07                          
day            05                          
svn rev        76782                       
language       R                           
version.string R version 3.6.1 (2019-07-05)
nickname       Action of the Toes

`rpois(..., 0)`, `rnorm(..., 0, 0)` and so on and so forth. I think you are likely to encounter this kind of problems as long as your simulation hits some boundary conditions. — ekoam, Nov 23 '20 at 09:26

Allan Cameron · Answer 1 · 2020-11-23T14:03:20.800

This is a pretty broad question, as there are many, many different ways of using the RNG in R. It's difficult to see how any answer could hope to be authoratative and comprehensive.

However, what may help you is to create a function that tests whether the RNG is invoked on a certain call:

invokes_RNG <- function(expr)
{
  store_RS <- .Random.seed
  
  mc <- match.call()$expr
  
  set.seed(1)
  RN1 <- rnorm(1)

  set.seed(1)
  eval(mc)
  RN2 <- rnorm(1)
  
  .Random.seed <<- store_RS
  
  RN2 != RN1
}

So for example:

invokes_RNG(rbinom(1, 1, 1))
#> [1] FALSE

invokes_RNG(rbinom(1, 1, 0.5))
#> [1] TRUE

Note that the function resets the random seed to where it was before you called the function, so you don't need to worry about the effect of the function itself.

Another tool that might help would be a wrapper function that is guaranteed to move the RNG on once, whether the call uses the RNG or not:

with_single_RNG <- function(expr)
{
  store_RS <- .Random.seed
  result <- eval(match.call()$expr)
  .Random.seed <<- store_RS
  rnorm(1)
  return(result)
}

So now we can call rbinom with any probability and get the same state of the random seed afterwards:

set.seed(1)
with_single_RNG(rbinom(5, 1, 0))
#> [1] 0 0 0 0 0
rnorm(1)
#> [1] 0.1836433

set.seed(1)
with_single_RNG(rbinom(5, 1, 0.5))
#> [1] 0 0 1 1 0
rnorm(1)
#> [1] 0.1836433

And the same function should work for any method that calls the RNG.

Peter O. · Answer 2 · 2021-10-29T00:18:33.937

One issue with R's approach to pseudorandom number generation that you may be concerned with is that the pseudorandom generator (PRNG) used by R is global, yet can be reseeded (with set.seed) for repeatable "randomness".

As far as I know, the algorithms for rnorm, runif, rpois, etc., are not documented at a level that allows you to know how far each method advances R's global PRNG and under what circumstances. To learn more about the exact algorithm of each method, look at R's source code. However, the fact the global PRNG can be reseeded for repeatable "randomness" means, in a sense, that the PRNG's algorithms can't easily be changed without affecting backward compatibility. See also an answer that relates to your concern: Creating a PRNG engine for <random> in C++11 that matches PRNG results in R

Even so, the use of global state for a PRNG is problematic, especially when the global seed can change behind the backs of the PRNG's users. (See also NumPy's RNG policy.) A better approach would have been for R to use lightweight PRNG objects, which are seeded individually, and pass those around (so that we would have, for example, rnorm(rng, mean, stdev).) Different kinds of PRNG objects could implement different PRNG algorithms and/or different random variate generation algorithms (that is, different versions of rnorm, runif, etc.).

Here are some ways to solve your underlying concern in your previous question:

Consider the technique of "common random numbers", which you can search for on this site.
Replace calls to rnorm, rpois, etc. With a custom method that calls runif to generate a uniform random variate, then transforms that variate via inversion to the desired distribution (see "inverse transform sampling").
Replace calls to rnorm, rpois, etc. With a custom method that calls runif to generate a uniform random variate, then uses that variate as the seed to a local (non-global) pseudorandom generator, which in turn is used to produce a variate from the desired distribution. Something similar to this is employed in JAX, especially its PRNGKey class.

Note that the last two suggestions will work for your purposes, assuming that runif always draws a fixed number of pseudorandom bits regardless of the circumstances.

When and why are random number generators not sampled for certain functions and their inputs?

2 Answers2

Linked