3

I know that I should avoid for-loops, but I'm not exactly sure how to do what I want to do with an apply function.

Here is a slightly simplified model of what I'm trying to do. So, essentially I have a big matrix of predictors and I want to run a regression using a window of 5 predictors on each side of the indexed predictor (i in the case of a for loop). With a for loop, I can just say something like:

results<-NULL
window<-5
for(i in 1:ncol(g))
{
    first<-i-window #Set window boundaries
    if(first<1){
        1->first
    }
    last<-i+window-1
    if(last>ncol(g)){
        ncol(g)->last
    }
    predictors<-g[,first:last]

    #Do regression stuff and return some result
    results[i]<-regression stuff
}

Is there a good way to do this with an apply function? My problem is that the vector that apply would be shoving into the function really doesn't matter. All that matters is the index.

trejder
  • 17,148
  • 27
  • 124
  • 216
JoshDG
  • 3,871
  • 10
  • 51
  • 85
  • afaik is the `apply` family just syntactic sugar, it doesn't actually speed up your code. – Sacha Epskamp Oct 03 '11 at 16:58
  • 1
    Sacha... not entirely true.. notably, lapply can sometimes have terrific speedups. Furthermore, the syntactic sugar is there to get you to break up complicated loops and functions so that you just apply to the components that need it. – John Oct 03 '11 at 17:24
  • For those interested, [this](http://stackoverflow.com/q/2275896/324364) SO question is a good reference for this issue. – joran Oct 03 '11 at 18:01

2 Answers2

9

This question touches several points that are made in 'The R Inferno' http://www.burns-stat.com/pages/Tutor/R_inferno.pdf

There are some loops you should avoid, but not all of them. And using an apply function is more hiding the loop than avoiding it. This example seems like a good choice to leave in a 'for' loop.

Growing objects is generally bad form -- it can be extremely inefficient in some cases. If you are going to have a blanket rule, then "not growing objects" is a better one than "avoid loops".

You can create a list with the final length by:

result <- vector("list", ncol(g))
for(i in 1:ncol(g)) {
    # stuff
    result[[i]] <- #results
}

In some circumstances you might think the command:

window<-5

means give me a logical vector stating which values of 'window' are less than -5.

Spaces are good to use, mostly not to confuse humans, but to get the meaning directly above not to confuse R.

Patrick Burns
  • 887
  • 4
  • 7
6

Using an apply function to do your regression is mostly a matter of preference in this case; it can handle some of the bookkeeping for you (and so possibly prevent errors) but won't speed up the code.

I would suggest using vectorized functions though to compute your first's and last's, though, perhaps something like:

window <- 5
ng <- 15 #or ncol(g)
xy <- data.frame(first = pmax( (1:ng) - window, 1 ), 
                  last = pmin( (1:ng) + window, ng) )

Or be even smarter with

xy <- data.frame(first= c(rep(1, window), 1:(ng-window) ), 
                 last = c((window+1):ng, rep(ng, window)) )

Then you could use this in a for loop like this:

results <- list()
for(i in 1:nrow(xy)) {
  results[[i]] <- xy$first[i] : xy$last[i]
}
results

or with lapply like this:

results <- lapply(1:nrow(xy), function(i) {
  xy$first[i] : xy$last[i]
})

where in both cases I just return the sequence between first and list; you would substitute with your actual regression code.

Aaron left Stack Overflow
  • 36,704
  • 7
  • 77
  • 142