0

I am calculating my first regression in R and I am running into what seems to be a typical error: "variable lengths differ".

After doing some testing, I found that this error occured because I used vectors inside lm that contain my variable names, rather than specifying the variable names directly. So I would have something like:

control_vars <- c("age","gender")
lm(dependent_var ~ control_vars, data = fictive_data)

This throws an error. Changing it to the following solves the issue:

lm(dependent_var ~ age + gender, data = fictive_data)

So is there any way to use name vectors inside regression models in R while avoiding this error?

Below is a reproducible example of my issue.

Thanks!

tdf <- data.frame(
  a=c(1,4,3,5,3,3),
  b=c(1,2,4,6,2,2),
  c=c(1,3,6,3,2,1)
)

#this works
test_model0 <- lm(a ~ b + c,
                 data= tdf)

#this doesn't
iv_1 <- c("b", "c")
test_model1 <- lm(a ~ iv_1,
                 data= tdf)

#neither does this
iv_2 <- "b + c"
test_model2<- lm(a ~ iv_2,
                  data= tdf)
degeso
  • 149
  • 6
  • 1
    One quick way is `lm(reformulate(control_vars, "dependent_var"), data=fictive_data)`. – user20650 Jan 04 '22 at 11:54
  • 1
    This did exactly what I was I was looking for, and it is a lot simpler than some of the other questions and answers I've seen about this on here. Thank you so much! – degeso Jan 04 '22 at 13:38
  • @user20650 I just noticed that using this approach to define a model and then creating a table with stargazer causes stargazer to display everything inside "termlabels" for the dependent variable. It does, however, not change the regression results themselves. Would you have any idea why this happens..? – degeso Jan 04 '22 at 17:47
  • 1
    I'm not sure what stargazer (im not a user). However, many of the approaches, including the one in my comment, messes up the `call` that is stored e.g. run `mod$call` after the code from my comment and compare it to the standard model so perhaps it uses this. You can use `do.call("lm", list(reformulate(control_vars, "dependent_var"), data=as.name("fictive_data")))` which will be a bit more robust in keeping the internal components the same (note this was also in the second link in the duplicate list) – user20650 Jan 04 '22 at 18:52

0 Answers0