-1

I'm currently writing code to run simulations for 2D spatially adaptive spline fitting techniques, within this code I have a for loop that should change the x, y and z values that I build my model from with each iteration.

N.B. To run this code you will require the MRSea package from github. This can be downloaded using: devtools::install_git("https://github.com/lindesaysh/MRSea.git")

After some very useful comments (thanks guys), I have reduced the code down to this:


simulate <- function(niter=5, npoints=10, nknots=9, fitmeasure="AIC", startknots=5, minknots=2,maxknots=9, knotgap=0 ) {
  require(MRSea)
  x <- matrix(nrow=niter, ncol=npoints)
  y <- matrix(nrow=niter, ncol=npoints)
  rnd <- matrix(nrow=niter, ncol=npoints)
  mygrid <- expand.grid(seq(0,1, length=npoints), seq(0,1, length=npoints)) 
  for (i in 1:niter) {
    x[i,] <- runif(npoints)  
    y[i,] <- runif(npoints) 
    mu <- x[i,]*sin(4*pi*y[i,]) 
    sigma <- diff(0.25 * range(mu)) 
    rnd[i,] <- rnorm(npoints)
    z <- mu + sigma*rnd[i,]
    dat <- data.frame(x.pos=x[i,], y.pos=y[i,], response=z)
    init <- glm(response ~ 1, data=dat)
    knotgrid <- getKnotgrid(coordData = cbind(x[i,],y[i,]), numKnots = nknots, plot=F) 
    distMats <- makeDists(cbind(x[i,],y[i,]), na.omit(knotgrid))
    salsa2dlist <-  list(fitnessMeasure = fitmeasure, knotgrid = knotgrid,
                         startKnots=startknots, minKnots=minknots, maxKnots=maxknots, gap=knotgap)
    predgrid <- makeDists(mygrid, na.omit(knotgrid))$dataDist
    sim <- runSALSA2D(model=init, salsa2dlist, d2k=distMats$dataDist,
                      k2k=distMats$knotDist, splineParams=NULL, tol=0, chooserad=F,
                      panels=NULL, suppress.printout=TRUE)

  }
  return(x) # Can return x,y or rnd to see the issue
}
simulate()

I have identified that the removing the following line of code allows the runif() lines to run correctly and produce different values:

sim <- runSALSA2D(model=init, salsa2dlist, d2k=distMats$dataDist,
                      k2k=distMats$knotDist, splineParams=NULL, tol=0, chooserad=F,
                      panels=NULL, suppress.printout=TRUE)

Does anyone know why this line may be causing problems for my code and potential ways to solve it?

Thanks.

  • 9
    Is all this code needed for us to undestand your problem? Can you reduce it to a minimal reproducible example? – Emmanuel-Lin Jan 04 '18 at 17:06
  • 3
    I would suggest making a game of this: how many lines of code can you delete and still see the expected problem? It's a very useful debugging technique, and if you don't solve the problem yourself doing it you will get much faster help here after it is done. – Gregor Thomas Jan 04 '18 at 17:21
  • But maybe some function you're calling uses `set.seed`, or assigns `surface` to something other than `1` so the for loop isn't run, or... Good way to find out is delete everything except the `for` loop and add it back in one bit at a time until the problem is observed. You could also add some `print` statements or `browser()` inside the loop to check on what's really going. – Gregor Thomas Jan 04 '18 at 17:40
  • Thanks for the help, I've edited the question with the simplest case I can find that reproduces the problem and identified the line that is causing it. – Andrew Mason Jan 04 '18 at 17:52
  • See not-quite-dups https://stackoverflow.com/q/14324096/210673 and https://stackoverflow.com/q/23090958/210673 – Aaron left Stack Overflow Jan 04 '18 at 19:20

2 Answers2

1

Something is setting the random seed. Maybe the knot.seed parameter? Though that seems more likely to be a number of knots. Regardless, a quick fix is to just put in set.seed(as.integer(Sys.time()) %% 10000) right before your random number generation.

It might be wise to dig in deeper to find what function is setting the seed, as it may or may not be affecting your results.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • It appears that this only sets the seed to 357 when K-fold cross validation is selected as the measure of fit. When AIC or BIC etc are selected this seed would not be set. – Andrew Mason Jan 04 '18 at 18:22
1

"Use the source, Luke..." runSALSA2D calls getCVids, which resets the seed. It really should only do that inside the function, and not affect the seed outside the function; you could file a bug/feature request about it. You can send a different seed to the function, but since it uses set.seed there's not an easy way to do that that will keep the good properties of the random number generator. Of course, using the same seed every time may lead to other issues, so you may want to try something; from the help file, passing it NULL would perhaps be nicest (except for not setting the seed at all!), but that's what runSALSA2D checks for before hard setting it. :(

https://github.com/lindesaysh/MRSea/blob/master/R/runSALSA2D.R https://github.com/lindesaysh/MRSea/blob/master/R/getCVfoldID.R

In the meantime, though, all you need to do is to save the seed after you generate the numbers within each loop and then reset it yourself.

runif(1)
saved.seed <- .Random.seed
for (i in 1:niter) {
    .Random.seed <- saved.seed
    x[i,] <- runif(npoints)  
    y[i,] <- runif(npoints) 
    mu <- x[i,]*sin(4*pi*y[i,]) 
    sigma <- diff(0.25 * range(mu)) 
    rnd[i,] <- rnorm(npoints)
    saved.seed <- .Random.seed

The first runif(1) is to ensure .Random.seed exists; see setting seed locally (not globally) in R.

Alternatively (for the example as written, anyway), you're not generating a lot of points; you could also generate them all at once and save them to be used in your loop.

EDIT: yes, do that!

You've allocated the space for all the random points at the top anyway! So there's no reason for them to be in the loop at all.

simulate3 <- function(...) {
  x <- matrix(runif(npoints*niter), nrow=niter, ncol=npoints)
  y <- matrix(runif(npoints*niter), nrow=niter, ncol=npoints)
  rnd <- matrix(rnorm(npoints*niter), nrow=niter, ncol=npoints)
  mygrid <- expand.grid(seq(0,1, length=npoints), seq(0,1, length=npoints)) 
  for(i in 1:niter) {
    mu <- x[i,]*sin(4*pi*y[i,]) 
    sigma <- diff(0.25 * range(mu)) 
    z <- mu + sigma*rnd[i,]
Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142
  • After looking at the source code, `runSALSA2D` only calls `getCVids` if `is.null(panels) != TRUE`, but `panels=NULL` in my example above. There is nowhere else I can find that is setting a seed in `runSALSA2D` other than `getCVids`. Using your suggested fixes I run into an error: `apply(knotDist[point, aR], 1, min): dim(x) must have positive length`. Thanks for your help, I'll figure out where this is coming from and speak to the author of the package to resolve the issue – Andrew Mason Jan 04 '18 at 22:00
  • Afraid you can't blame that one on me; it happens sometimes (depending on the randomness) regardless of whether you include the fixes or not. – Aaron left Stack Overflow Jan 04 '18 at 22:10
  • You're right, though, about `getCVids`; looks like it's instead in `return.reg.spline.fit.2d` which calls `initialise.measures_2d` with a default of `knot.seed = 10`. – Aaron left Stack Overflow Jan 04 '18 at 22:34
  • That's it! `runSALSA2D` is calling `return.reg.spline.fit.2D` with `initialise=TRUE`, this then calls `initialise.measure_2d`. As `initialise=TRUE` this function is running `set.seed(knot.seed)` where `knot.seed = 10`. I'll pass this on to the author and get it changed the next couple of days, thank you very much. – Andrew Mason Jan 04 '18 at 22:54