0

I need to fit a glm to each row of a matrix. I can write a for loop to do this. I can also do it using apply, which is what usually seems to be recommended. However, the code runs faster using the for loop. Can someone tell me why this is, and what is the proper way to code something like this?

#some dummy data

response_mat<-matrix(0,10000,100)
response_mat<-apply(response_mat, c(1,2), function(x) sample(c(0,1),1)) 
predictor<-rnorm(100)

#fit glm to each row using a for loop 

ptm <- proc.time()
for (i in 1:nrow(response_mat)){
    model<-glm(response_mat[i,]~predictor,family="binomial")}
proc.time() - ptm


#fit glm using apply    

glm_function<-function(x){model<-glm(x~predictor,family="binomial")}
ptm <- proc.time()  
apply(response_mat,1,glm_function)
proc.time() - ptm
joran
  • 169,992
  • 32
  • 429
  • 468
Esteban
  • 13
  • 3
  • 1
    (a) the `apply` method doesn't just save the output, but prints it on the console, which delays things a bit. (b) the for loop method saves only the current model each time and not all of them in a big list. – AntoniosK Jul 30 '19 at 15:31
  • I ran this twice and `for` was faster then `apply` was faster if you assign it and differences are very, very marginal. Timings differed with each run. There's no universal rule of the faster iteration. All is context-specific. And both `for` and `apply` (unlike other `apply` family functions) [run at the R level (not C)](https://stackoverflow.com/a/29006276/1422451). – Parfait Jul 30 '19 at 18:22

1 Answers1

0

apply stores each model into a list and, if you don't assign the call to a variable, it will print this list. In your for loop, both of those things don't happen.

If you modify your for loop like in the code below, you will get running times that are closer (probably higher in the for loop).

# Create an empty list to store the models
models_list <- list()

ptm <- proc.time()

for (i in 1:nrow(response_mat)) {
  model <- glm(response_mat[i, ] ~ predictor, family = "binomial")

  # Store each model as an element of the list
  models_list [[i]] <- model
}

# Print the list
print(models_list)

proc.time() - ptm
Gabriel M. Silva
  • 642
  • 4
  • 10
  • Thank you for your very quick and clear response. For me the for loop still marginally beats apply, even when I use the empty list example you provide. So, if the running time for apply is not much better than a for loop, is there some better way to code this kind of problem? – Esteban Jul 30 '19 at 16:30
  • Bear in mind that a one-time run of the code is not sufficient to benchmark two codes. There is some variation each time you run it. I like to use `microbenchmark` package for this. I don't know exacly what you want from each model, do you need *everything* from the GLM output? Also, looking at your function maybe you could replace `apply` for `sapply`, because it specifies its output type, which can improve perfomance. If you are interested in how to improve perfomance of your R code, take a read at [this chapter](https://adv-r.hadley.nz/perf-improve.html) from Advanced R. – Gabriel M. Silva Jul 30 '19 at 17:57