I have a data frame and I need to run 6 2-variable linear models for each group 'site'. Then, I need to convert the results to a data frame. The second variable in the linear model changes. I have that part down using lapply()
, but I can't figure out how to run by groups. I've found answers on SO that answer parts of my question, but I can't figure out how to put it all together.
Here's some data:
structure(list(SiteName = structure(c(1L, 1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 2L), .Label = c("bp10", "bp12"), class = "factor"),
DMWT = c(13.9697916666667, 13.9125, 14.2152083333333, 14.7810416666667,
15.1541666666667, 15.7535416666667, 17.3254166666667, 18.4872916666667,
20.0564583333333, 21.0595833333333, 21.3925), DMAT = c(16.6714631359947,
18.474493439025, 20.9517661662977, 23.7017661662978, 25.5957055602372,
20.9688840743375, 23.7188840743375, 25.6128234682769, 27.5143386197921,
27.6279749834285, 26.1355507410042), ADD = c(0, 0, 0, 1.90367965367967,
5.70129870129876, 0, 1.90367965367967, 5.70129870129876,
11.400432900433, 17.2132034632037, 21.53354978355), Air200 = c(7.3229782875097,
7.40616010569152, 7.50025101478243, 7.63384949963092, 7.78642525720668,
7.51736892282216, 7.65096740767065, 7.80354316524641, 7.97854316524641,
8.14729316524641, 8.29592952888278), Air100 = c(15.2711601056916,
15.362599499631, 15.512902529934, 15.727296469328, 15.9717661662977,
15.5300204379738, 15.7444143773677, 15.9888840743374, 16.2306264985798,
16.4472174076707, 16.6433537713071), Air75 = c(16.8986348531664,
17.0426752572068, 17.1927762673078, 17.3687358632674, 17.5567156612472,
17.2098941753475, 17.3858537713071, 17.5738335692869, 17.7820153874687,
18.0100961955496, 18.2532275086809), Air50 = c(19.5072207117523,
19.6340388935705, 19.7382813178129, 19.8887358632674, 20.1060085905402,
19.7553992258526, 19.9058537713072, 20.1231264985799, 20.4400961955496,
20.7669143773678, 20.9841871046405), Air10 = c(21.9214631359947,
21.5850994996311, 21.2563116208432, 21.1714631359947, 21.4502510147826,
21.2734295288829, 21.1885810440344, 21.4673689228223, 21.9696416500951,
22.3779749834284, 22.5476719531254)), .Names = c("SiteName",
"DMWT", "DMAT", "ADD", "Air200", "Air100", "Air75", "Air50",
"Air10"), row.names = c(547L, 548L, 549L, 550L, 551L, 1593L,
1594L, 1595L, 1596L, 1597L, 1598L), class = "data.frame")
Here's the code for using each variable in a model. How do I use the sites?:
siteslist <- unique(d$SiteName)
varlist <- names(d)[4:9]
models <- lapply(varlist, function(x) { # apply the modeling function to our list of air variables
lm(substitute(DMWT ~ DMAT + i, list(i = as.name(x))), data = d) # linear model with air variable substituted
})
Then get model results & convert to a data frame:
library(relaimpo)
sumfun <- function(x) c(coef(x),
summary(x)$adj.r.squared,
sqrt(mean(resid(x)^2,na.rm=TRUE)),
calc.relimp(x,type="betasq")$betasq[1],
calc.relimp(x,type="betasq")$betasq[2],
calc.relimp(x,type="pratt")$pratt[1],
calc.relimp(x,type="pratt")$pratt[2])
mod.df <- as.data.frame(t(sapply(models,sumfun)))
Also tried combining variables and sites to do something like this:
siteslist <- unique(d$SiteName)
varlist <- names(d)[4:9]
sets <- expand.grid(SiteName = siteslist, Var = varlist)
models <- lapply(1:nrow(sets), function(x) { # apply the modeling function to our list of air variables
lm(substitute(DMWT ~ DMAT + i, list(i = as.name(sets$Var[x]))), data = d[d$SiteName == sets$SiteName[x],]) # linear model with air variable substituted
})
...but I get an error "Error in eval(expr, envir, enclos) : object '1' not found"