-2

Thanks for reading this thread. I'm relatively new to R so this question might seem stupid.

So, I have a data set on product prices. It is a 240 by 1,000 matrix. Each column represents a unique product and each row gives price info of the 1,000 at a specific month. I'm trying to re-sample the data set and get a new matrix of the same dimensions.

  • My data is saved as "data"

  • I would want to save the bootstrapped results in "newdata", which is an empty 240x1,000 matrix

Here's my code:

for (month in 1:num.months)
{  
  for (n in 1:num.products)
  {
    newdata[month, n] <- mean(sample(data[month, ], 
                                size = num.productss,
                     replace = TRUE));
  }
 }

This works but the For Loops make things really slow. It would be great if someone can point out how I could improve the speed by using apply, sapply, tapply, and etc. Thanks.

  • 1
    Have you searched for a similar question on SO or anywhere? – Seth Jul 05 '12 at 22:31
  • what are `num.months` and `num.products`? the dimensions of the matrix? – Chase Jul 05 '12 at 22:34
  • 1
    Pretty much all you need to know is here: [Is R's apply family more than syntactic sugar](http://stackoverflow.com/questions/2275896/is-rs-apply-family-more-than-syntactic-sugar) (Or maybe not all, but more than enough to get you started...) – Matt Parker Jul 05 '12 at 22:38
  • 1
    have you looked at package boot? http://www.statmethods.net/advstats/bootstrapping.html – Chase Jul 05 '12 at 22:39
  • 1
    Are you really intending to replace each value in the matrix with the average of values from that row, sampled with replacement? I'm not sure what that will accomplish, but I certainly wouldn't call it bootstrapping, FWIW. – joran Jul 05 '12 at 22:41

1 Answers1

0

I suggest you try and look at the bootstrap functions and packages already available in R before creating your own sampling method.

However, this will give a list, each element a matrix sampled from the original. Timings included:

> m = matrix(rnorm(24000),nrow=1000,ncol=24)
> nbootstrap = 100
> 
> system.time((mboot = lapply(1:nbootstrap, function(i)
+   {
+    m[sample(1:nrow(m),replace=T),]
+ })))
   user  system elapsed 
   0.27    0.00    0.26 

> m = matrix(rnorm(24000),nrow=1000,ncol=24)
> nbootstrap = 1000
> 
> system.time((mboot = lapply(1:nbootstrap, function(i)
+   {
+    m[sample(1:nrow(m),replace=T),]
+ })))
   user  system elapsed 
   1.45    0.03    1.59 

> m = matrix(rnorm(240000),nrow=1000,ncol=240)
> nbootstrap = 100
> 
> system.time((mboot = lapply(1:nbootstrap, function(i)
+   {
+    m[sample(1:nrow(m),replace=T),]
+ })))
   user  system elapsed 
   0.97    0.05    1.02 

> m = matrix(rnorm(240000),nrow=1000,ncol=240)
> nbootstrap = 1000
> 
> system.time((mboot = lapply(1:nbootstrap, function(i)
+   {
+    m[sample(1:nrow(m),replace=T),]
+ })))
   user  system elapsed 
   6.60    1.20    7.97 
Davy Kavanagh
  • 4,809
  • 9
  • 35
  • 50
  • Are you sure that you are offering effective assistance by selecting a random set of rows? Each row was supposedly a separate product. – IRTFM Jul 06 '12 at 04:02
  • I had thought the columns were the products, and the rows were the prices? If I got that wrong, sorry. – Davy Kavanagh Jul 06 '12 at 08:14
  • You read it correctly. My error. My only remaining concern is whether it was requested that the time slots be selected en bloc or randomized within product – IRTFM Jul 06 '12 at 11:59
  • Then the entries in each row might not contain the time matched pricings. Perhaps this is desirable, but I thought that it would be bad. Now that I think about it, for bootstrapping, I suppose you want truely random, and what I have above wouldn't be random. so yeah, perhaps a nested lapply or something? – Davy Kavanagh Jul 06 '12 at 12:06