2

Problem:

I am using a loop in R to create a new vector from two ("parent") vectors, generating a random value for each position in the new vector that is in the range of the values that the parents have in this position (it's for the crossover phase in a genetic algorithm). Note that I don't want the mean values of x & y, but namely random values that are in range of the values on the respective positions.

Example code:

x = c(0.1, 0.7, 1, 0.8)
y = c(0, 0.9, 0.2, 1)
child = rep(NA, length(x))
for(i in 1:length(x)){
  child[i] = sample(seq(min(x[i], y[i]),  
                        max(x[i],y[i]), by=0.01), 1)
}
# This might yield, for example: 0.02 0.83 0.73 0.88 

Question:

It works fine, but I'm thinking maybe there's a more efficient way to do this (since I need to do this for 100-1000 individuals on each of the thousands of iterations). In R, there are nice fast functions like ifelse, colMeans, max.col, match, rollmean, etc., that work on vectors, so I'm wondering, is there's something like that for my purposes as well? (the apply gang probably wouldn't help much here though, from what I understand). Or is a loop like this really the best I can do?

Community
  • 1
  • 1
user3554004
  • 1,044
  • 9
  • 24

3 Answers3

6

We can use runif to get random numbers from a uniform distribution, and pmax and pmin to vectorize the min and max:

round(runif(length(x), pmin(x, y), pmax(x, y)), 2)

A small benchmark:

library(microbenchmark)

set.seed(42)
x <- runif(1000)
y <- runif(1000)

microbenchmark(vectorize ={round(runif(length(x), pmin(x, y), pmax(x, y)), 2)},
               mapply =  {mapply(runif, 1, pmin(x, y), pmax(x, y))},
               lapply = {unlist(lapply(seq_along(x), function(p, q, i) { sample(seq(min(p[i], q[i]), max(p[i],q[i]), by=0.01), 1) }, p=x, q=y))})

Unit: microseconds
      expr       min        lq       mean     median        uq       max neval cld
 vectorize   316.417   321.026   341.6501   336.0015   342.914   529.154   100 a  
    mapply  4311.559  4429.640  4733.0420  4543.6875  4806.535  9935.631   100  b 
    lapply 46987.459 47718.980 50484.6058 48474.5015 53599.756 60043.093   100   c
jeremycg
  • 24,657
  • 5
  • 63
  • 74
  • @ jeremycg: `res <- round(runif(length(x), pmin(x, y), pmax(x, y)), 2); table(pmin(x,y)<=res); table(pmax(x,y)>=res)` showed some values are outside the range...surprisingly. – fishtank Nov 23 '15 at 22:53
  • @fishtank its rounding error - eg the `p(min x, y)` might be 0.232, and the random number we got is 0.234, which gets rounded to 0.23, and we are out of range. In this case, op is only using 2 digits as input, so doesnt need to worry (more broadly, don't round to lower than your input). – jeremycg Nov 23 '15 at 23:16
  • The rounding part is not necessary anyway; I just had simple "short" numbers in the example to keep it, well, simple. Besides being a good answer, it's always nice to see comparisons to the previous "state of the art". – user3554004 Nov 24 '15 at 12:01
4

Here's an mapply solution:

mapply(runif, 1, pmin(x,y), pmax(x,y))

(Although @jeremycg's solution shows that you don't need *apply functions and can just vectorize the min and max for runif as well.)

eipi10
  • 91,525
  • 24
  • 209
  • 285
3

Here's a data.table solution with 10 million records in 2 seconds:

library(data.table)
set.seed(4444)
n <- 10000000
system.time({
  dt <- data.table(x=runif(n=n,min=0,max=10),y=runif(n=n,min=0,max=10))
  dt[,child := runif(n=n,min=pmin(x,y),max=pmax(x,y)),by=.I]
})
dt

#user  system elapsed 
#2.01    0.03    2.06 
ddunn801
  • 1,900
  • 1
  • 15
  • 20
  • Also a good idea. But I choose @jeremycg's answer, as it seems more suited for my purposes (the vectors in the example is pretty much what I'm working with). – user3554004 Nov 24 '15 at 12:07