-1

I have the following code

set.seed(30)
nsim <- 50    ## NUMBER OF REPLICATIONS
demand <- c(12,13,24,12,13,12,14,10,11,10)

res <- replicate(nsim, {
    load <- runif(10,11,14)
    diff <- load - demand    ## DIFFERENCE BETWEEN DEMAND AND LOAD 
    return(sum(diff < 0))
})
res
[1] 6 5 7 4 4 5 4 3 6 4 5 5 5 4 2 5 3 3 3 5 3 2 4 6 5 4 4 3 5 6 4 4 3 6 5 3 5 5 4 3 3
[42] 6 4 4 4 6 6 5 4 5

I have a huge data set and the question is what is the fastest way of calculating the mean for every replication. For example the res in first replication is 6 so the result should be 6/1=6 for the second (6+5)/2=5.5 for the third (6+5+7)/3=6 and for the last replication is sum(res)/nsim=4.38

Sandipan Dey
  • 21,482
  • 2
  • 51
  • 63
kelamahim
  • 577
  • 1
  • 4
  • 21

1 Answers1

1

We can get the cumulative sum of the result and divide by the sequence of the 'res'

cumsum(res)/seq_along(res)
#[1] 6.000000 5.500000 6.000000 5.500000 5.200000 5.166667 5.000000 4.750000 4.888889 4.800000 4.818182 4.833333 4.846154 4.785714 4.600000 4.625000 4.529412
#[18] 4.444444 4.368421 4.400000 4.333333 4.227273 4.217391 4.291667 4.320000 4.307692 4.296296 4.250000 4.275862 4.333333 4.322581 4.312500 4.272727 4.323529
#[35] 4.342857 4.305556 4.324324 4.342105 4.333333 4.300000 4.268293 4.309524 4.302326 4.295455 4.288889 4.326087 4.361702 4.375000 4.367347 4.380000

Or with cummean from dplyr

library(dplyr)
cummean(res)

Both the solutions are vectorized and should be fast

akrun
  • 874,273
  • 37
  • 540
  • 662