0

I am wondering if there is a way to "predefine" parameters to functions like lm, glmer(lme4), glm, or home made functions.

I'll try to show my question with a small dataframe

y1<-(rnorm(n = 100, mean = 0, sd = 1))
y2<-(rnorm(n = 100, mean = 4, sd = 1))
x1 <- letters[1:2]; x1<- rep(x1, times =50 )
x2 <- letters[2:3]; x1<- rep(x1, times =50 )
x3 <- letters[4:5]; x1<- rep(x1, times =50 )
df<-as.data.frame(cbind(y1,y2,x1,x2,x3));df$y1<-as.numeric(df$y1);df$y2<-as.numeric(df$y2)

then I can easily fit lm like this

model <- lm(y1 ~x1, data=df)

However, what I am interested in being able to do is something like this

#first define list of predictors 
predictor_vector<- c("x1","x2","x3")

And then use the names (strings) as a parameter in the lm() function.

In this example, I am using lm() and attempting to dynamically construct the regression as so:

model <- lm(y1 ~predictor_vector[1], data=df)
model <- lm(y1 ~predictor_vector[2], data=df)
model <- lm(y1 ~predictor_vector[3], data=df)

The example above doesn't work.

I am very grateful for any input on this topic and hope my example and explanation is clear enough.

alexizydorczyk
  • 850
  • 1
  • 6
  • 25

2 Answers2

0

We may use a loop. Construct the formula with reformulate or paste, and apply the lm to return the models in a list

out <- lapply(predictor_vector, function(x)
     lm(reformulate(x, response = "y1"), data = df))
akrun
  • 874,273
  • 37
  • 540
  • 662
0

The core issue to understand here is that lm() takes a type formula as the first parameter that specifies the regression.

You've created a vector of strings (characters) but R won't dynamically generated the formula for you in the function call - the ability to just type variable names as a formula is a convenience but not practical when you are attempting to be dynamic.

To simplify your example, start with:

y1 <- (rnorm(n = 10, mean = 0, sd = 1))
x1 <- (rnorm(n = 10, mean = 0, sd = 1))
x2 <- (rnorm(n = 10, mean = 0, sd = 1))
x3 <- (rnorm(n = 10, mean = 0, sd = 1))

df <- as.data.frame(cbind(y1,x1,x2,x3))

predictors = c("x1", "x2", "x3")

Now you can dynamically create a formula as as concatenated string (paste0) and convert it to a formula. Then pass this formula to your lm() call:

form1 = as.formula(paste0("y1~", predictors[1]))

lm(form1, data = df)

As akrun pointed out, you can then start doing things like create loops to dynamically generate these.

You can also do things like:

my_formula = as.formula(paste0("y1~", paste0(predictors, collapse="+")))

## generates y1 ~ x1 + x2 + x3
lm(my_formula, data = df)

See also: Formula with dynamic number of variables

One of the answers on that page also mentions akrun's alternative way of doing this, using the function reformulate. From ?reformulate:

reformulate creates a formula from a character vector. If length(termlabels) > 1, its elements are concatenated with +. Non-syntactic names (e.g. containing spaces or special characters; see make.names) must be protected with backticks (see examples). A non-parseable response still works for now, back compatibly, with a deprecation warning.

alexizydorczyk
  • 850
  • 1
  • 6
  • 25
  • 1
    This way is (unfortunately) fairly widespread but there’s absolutely no need to go the detour via strings. R allows direct metaprogramming on actual, unevaluated R expressions. Instead of moderately complex string operations you could write e.g. `bquote(y1 ~ .(as.name('x1')))`. – Konrad Rudolph Oct 18 '21 at 21:31
  • @izyda, thanks a lot for your detailed and helpful answer! All you suggest works well and your explanation also helped me understand why it didn't work in the first place. – user547928359 Oct 19 '21 at 06:42
  • @Konrad Rudolph, I am certain there is some great information in your comment but unfortunately i do not understand what you mean. What izyda suggested seems work for great in the example. Could you elaborate? – user547928359 Oct 19 '21 at 06:44