I'm trying to create a series of models based on subsets of different categories in my data. Instead of creating a bunch of individual model objects, I'm using lapply()
to create a list of models based on subsets of every level of my category factor, like so:
test.data <- data.frame(y=rnorm(100), x1=rnorm(100), x2=rnorm(100), category=rep(c("A", "B"), 2))
run.individual.models <- function(x) {
lm(y ~ x1 + x2, data=test.data, subset=(category==x))
}
individual.models <- lapply(levels(test.data$category), FUN=run.individual.models)
individual.models
# [[1]]
# Call:
# lm(formula = y ~ x1 + x2, data = test.data, subset = (category ==
# x))
# Coefficients:
# (Intercept) x1 x2
# 0.10852 -0.09329 0.11365
# ....
This works fantastically, except the model call shows subset = (category == x)
instead of category == "A"
, etc. This makes it more difficult to use both for diagnostic purposes (it's hard to remember which model in the list corresponds to which category) and for functions like predict()
.
Is there a way to substitute the actual character value of x
into the lm()
call so that the model doesn't use the raw x
in the call?