4

I have generated a list of models, and would like to create a summary table.

As and example, here are two models:

x <- seq(1:10)
y <- sin(x)^2
model1 <- lm(y ~ x)
model2 <- lm(y ~ x + I(x^2) + I(x^3))

and two formulas, the first generating the equation from components of formula

get.model.equation <- function(x) {
  x <- as.character((x$call)$formula)
  x <- paste(x[2],x[1],x[3])
}

and the second generating the name of model as a string

get.model.name <- function(x) {
  x <- deparse(substitute(x))
}

With these, I create a summary table

model.list <- list(model1, model2)
AIC.data <- lapply(X = model.list, FUN = AIC)
AIC.data <- as.numeric(AIC.data)
model.models <- lapply(X = model.list, FUN = get.model)
model.summary <- cbind(model.models, AIC.data)
model.summary <- as.data.frame(model.summary)
names(model.summary) <- c("Model", "AIC")
model.summary$AIC <- unlist(model.summary$AIC)
rm(AIC.data)
model.summary[order(model.summary$AIC),]

Which all works fine. I'd like to add the model name to the table using get.model.name

x <- get.model.name(model1)

Which gives me "model1" as I want.

So now I apply the function to the list of models

model.names <- lapply(X = model.list, FUN = get.model.name)

but now instead of model1 I get X[[1L]]

How do I get model1 rather than X[[1L]]?

I'm after a table that looks like this:

 Model                Formula       AIC
model1                  y ~ x  11.89136
model2 y ~ x + I(x^2) + I(x^3) 15.03888
Arun
  • 116,683
  • 26
  • 284
  • 387
Isaiah
  • 2,091
  • 3
  • 19
  • 28
  • you've created an unnamed list in `model.list` and you're passing each element of this list to `get.model.name`. So, `X[[1]]` is indeed passed the first time, and it fetches what you've asked for. @baptiste's overcomes this issue by creating simply a named list and avoiding the complications. If his solution is not what you're looking for, you'll have to explain, perhaps, why you're doing it this way. – Arun Mar 02 '13 at 09:35
  • Thanks Arun and baptiste. I've added what I'm trying to produce. – Isaiah Mar 02 '13 at 11:02
  • 1
    Thanks again. I've just replaced `model.list <- list(model1, model2)` with `model.list <- list(model1=model1, model2=model2)` after reading your comments and it works well now. – Isaiah Mar 02 '13 at 13:43

3 Answers3

10

Do you want something like this?

model.list <- list(model1 = lm(y ~ x), 
                   model2 = lm(y ~ x + I(x^2) + I(x^3)))
sapply(X = model.list, FUN = AIC)
baptiste
  • 75,767
  • 19
  • 198
  • 294
2

I'd do something like this:

model.list <- list(model1 = lm(y ~ x),
                   model2 = lm(y ~ x + I(x^2) + I(x^3)))
# changed Reduce('rbind', ...) to do.call(rbind, ...) (Hadley's comment)
do.call(rbind, 
        lapply(names(model.list), function(x) 
          data.frame(model = x, 
          formula = get.model.equation(model.list[[x]]), 
          AIC = AIC(model.list[[x]])
          )
        )
      )

#    model                 formula      AIC
# 1 model1                   y ~ x 11.89136
# 2 model2 y ~ x + I(x^2) + I(x^3) 15.03888
Arun
  • 116,683
  • 26
  • 284
  • 387
  • 1
    `ldply` makes this pattern a lot simpler... (also `Reduce` + `rbind` will be v. inefficient for large lists) – hadley Mar 02 '13 at 14:36
  • yes indeed. thanks for pointing out about `Reduce`. I've edited using `do.call(rbind,...)`. – Arun Mar 02 '13 at 20:14
2

Another option, with ldply, but see hadley's comment below for a more efficient use of ldply:

 # prepare data
    x <- seq(1:10)
    y <- sin(x)^2
    dat <- data.frame(x,y)

# create list of named models obviously these are not suited to the data here, just to make the workflow work...
models <- list(model1=lm(y~x, data = dat), 
               model2=lm(y~I(1/x), data=dat),
               model3=lm(y ~ log(x), data = dat),
               model4=nls(y ~ I(1/x*a) + b*x, data = dat, start = list(a = 1, b = 1)), 
               model5=nls(y ~ (a + b*log(x)), data=dat, start = setNames(coef(lm(y ~ log(x), data=dat)), c("a", "b"))),
               model6=nls(y ~ I(exp(1)^(a + b * x)), data=dat, start = list(a=0,b=0)),
               model7=nls(y ~ I(1/x*a)+b, data=dat, start = list(a=1,b=1))
)

library(plyr)
library(AICcmodavg) # for small sample sizes
# build table with model names, function, AIC and AICc
data.frame(cbind(ldply(models, function(x) cbind(AICc = AICc(x), AIC = AIC(x))), 
                 model = sapply(1:length(models), function(x) deparse(formula(models[[x]])))
                      ))

     .id     AICc      AIC                     model
1 model1 15.89136 11.89136                     y ~ x
2 model2 15.78480 11.78480                y ~ I(1/x)
3 model3 15.80406 11.80406                y ~ log(x)
4 model4 16.62157 12.62157    y ~ I(1/x * a) + b * x
5 model5 15.80406 11.80406      y ~ (a + b * log(x))
6 model6 15.88937 11.88937 y ~ I(exp(1)^(a + b * x))
7 model7 15.78480 11.78480        y ~ I(1/x * a) + b

It's not immediately obvious to me how to replace the .id with a column name in the ldply function, any tips?

Ben
  • 41,615
  • 18
  • 132
  • 227
  • 1
    Or just: `ldply(models, function(mod) data.frame(AICc = AICc(mod), AIC = AIC(mod), model = deparse(formula(mod))) })` – hadley Mar 03 '13 at 02:07
  • @hadley, thanks yes, that's much simpler (an open brace needed to make it work). Would you care to simplify this one from the school of dirty tricks: http://stackoverflow.com/a/15104211/1036500 ? – Ben Mar 03 '13 at 07:31
  • @ Ben. Thank you for nice demonstration of the problem. It is very useful. I wanna make sure if we can extract other statistics such as R2, Pvalue etc in this table? If so, how to it to R? thanks again. – R starter May 29 '19 at 12:49