2

I'm trying to create a series of models based on subsets of different categories in my data. Instead of creating a bunch of individual model objects, I'm using lapply() to create a list of models based on subsets of every level of my category factor, like so:

test.data <- data.frame(y=rnorm(100), x1=rnorm(100), x2=rnorm(100), category=rep(c("A", "B"), 2))

run.individual.models <- function(x) {
  lm(y ~ x1 + x2, data=test.data, subset=(category==x))
}

individual.models <- lapply(levels(test.data$category), FUN=run.individual.models)
individual.models

# [[1]]

# Call:
# lm(formula = y ~ x1 + x2, data = test.data, subset = (category == 
#     x))

# Coefficients:
# (Intercept)           x1           x2  
#     0.10852     -0.09329      0.11365
# ....

This works fantastically, except the model call shows subset = (category == x) instead of category == "A", etc. This makes it more difficult to use both for diagnostic purposes (it's hard to remember which model in the list corresponds to which category) and for functions like predict().

Is there a way to substitute the actual character value of x into the lm() call so that the model doesn't use the raw x in the call?

Andrew
  • 36,541
  • 13
  • 67
  • 93
  • See http://stackoverflow.com/questions/15754362/explicit-formula-used-in-linear-regression/15754430#15754430 – mnel Apr 26 '13 at 05:33
  • The order is the same as in your `levels(test.data$category)`. Would it be acceptable to change list names to reflect that? – Roman Luštrik Apr 26 '13 at 05:44
  • That fixes the first issue (making the list more human readable), but it still creates a bunch of models that are unusable by `predict()` outside of the list (i.e. the models can't stand alone). – Andrew Apr 26 '13 at 12:13

1 Answers1

3

Along the lines of Explicit formula used in linear regression

Use bquote to construct the call

run.individual.models <- function(x) {
  lmc <- bquote(lm(y ~ x1 + x2, data=test.data, subset=(category==.(x))))
  eval(lmc)
}

individual.models <- lapply(levels(test.data$category), FUN=run.individual.models)
individual.models

[[1]]

Call:
lm(formula = y ~ x1 + x2, data = test.data, subset = (category == 
    "A"))

Coefficients:
(Intercept)           x1           x2  
   -0.08434      0.05881      0.07695  


[[2]]

Call:
lm(formula = y ~ x1 + x2, data = test.data, subset = (category == 
    "B"))

Coefficients:
(Intercept)           x1           x2  
     0.1251      -0.1854      -0.1609 
Community
  • 1
  • 1
mnel
  • 113,303
  • 27
  • 265
  • 254