In R - Change For Loops into "apply", "tapply", "sapply" and etc

Question

Thanks for reading this thread. I'm relatively new to R so this question might seem stupid.

So, I have a data set on product prices. It is a 240 by 1,000 matrix. Each column represents a unique product and each row gives price info of the 1,000 at a specific month. I'm trying to re-sample the data set and get a new matrix of the same dimensions.

My data is saved as "data"
I would want to save the bootstrapped results in "newdata", which is an empty 240x1,000 matrix

Here's my code:

for (month in 1:num.months)
{  
  for (n in 1:num.products)
  {
    newdata[month, n] <- mean(sample(data[month, ], 
                                size = num.productss,
                     replace = TRUE));
  }
 }

This works but the For Loops make things really slow. It would be great if someone can point out how I could improve the speed by using apply, sapply, tapply, and etc. Thanks.

what are `num.months` and `num.products`? the dimensions of the matrix? — Chase, Jul 05 '12 at 22:34
Pretty much all you need to know is here: [Is R's apply family more than syntactic sugar](http://stackoverflow.com/questions/2275896/is-rs-apply-family-more-than-syntactic-sugar) (Or maybe not all, but more than enough to get you started...) — Matt Parker, Jul 05 '12 at 22:38
have you looked at package boot? http://www.statmethods.net/advstats/bootstrapping.html — Chase, Jul 05 '12 at 22:39
Are you really intending to replace each value in the matrix with the average of values from that row, sampled with replacement? I'm not sure what that will accomplish, but I certainly wouldn't call it bootstrapping, FWIW. — joran, Jul 05 '12 at 22:41

score 0 · Answer 1 · answered Jul 05 '12 at 23:46

I suggest you try and look at the bootstrap functions and packages already available in R before creating your own sampling method.

However, this will give a list, each element a matrix sampled from the original. Timings included:

> m = matrix(rnorm(24000),nrow=1000,ncol=24)
> nbootstrap = 100
> 
> system.time((mboot = lapply(1:nbootstrap, function(i)
+   {
+    m[sample(1:nrow(m),replace=T),]
+ })))
   user  system elapsed 
   0.27    0.00    0.26 

> m = matrix(rnorm(24000),nrow=1000,ncol=24)
> nbootstrap = 1000
> 
> system.time((mboot = lapply(1:nbootstrap, function(i)
+   {
+    m[sample(1:nrow(m),replace=T),]
+ })))
   user  system elapsed 
   1.45    0.03    1.59 

> m = matrix(rnorm(240000),nrow=1000,ncol=240)
> nbootstrap = 100
> 
> system.time((mboot = lapply(1:nbootstrap, function(i)
+   {
+    m[sample(1:nrow(m),replace=T),]
+ })))
   user  system elapsed 
   0.97    0.05    1.02 

> m = matrix(rnorm(240000),nrow=1000,ncol=240)
> nbootstrap = 1000
> 
> system.time((mboot = lapply(1:nbootstrap, function(i)
+   {
+    m[sample(1:nrow(m),replace=T),]
+ })))
   user  system elapsed 
   6.60    1.20    7.97

Are you sure that you are offering effective assistance by selecting a random set of rows? Each row was supposedly a separate product. — IRTFM, Jul 06 '12 at 04:02
I had thought the columns were the products, and the rows were the prices? If I got that wrong, sorry. — Davy Kavanagh, Jul 06 '12 at 08:14
You read it correctly. My error. My only remaining concern is whether it was requested that the time slots be selected en bloc or randomized within product — IRTFM, Jul 06 '12 at 11:59
Then the entries in each row might not contain the time matched pricings. Perhaps this is desirable, but I thought that it would be bad. Now that I think about it, for bootstrapping, I suppose you want truely random, and what I have above wouldn't be random. so yeah, perhaps a nested lapply or something? — Davy Kavanagh, Jul 06 '12 at 12:06

In R - Change For Loops into "apply", "tapply", "sapply" and etc

1 Answers1