1

I have a database where I want to do several multiple regressions. They all look like this:

fit <- lm(Variable1 ~ Age + Speed + Gender + Mass, data=Data)

The only variable changing is variable1. Now I want to loop or use something from the apply family to loop several variables at the place of variable1. These variables are columns in my datafile. Can someone help me to solve this problem? Many thanks!

what I tried so far:

When I extract one of the column names with the names() function I do get a the name of the column:

varname  = as.name(names(Data[14])) 

But when I fill this in (and I used the attach() function):

fit <- lm(Varname ~ Age + Speed + Gender + Mass, data=Data) 

I get the following error:

Error in model.frame.default(formula = Varname ~ Age + Speed + Gender + : object is not a matrix

I suppose that the lm() function does not recognize Varname as Variable1.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Koot6133
  • 1,428
  • 15
  • 26
  • Maybe [this post](http://stackoverflow.com/questions/41230953/how-do-you-dynamically-build-a-liner-model-based-on-column-names/41231035#41231035) and it link will be helpful for creating the formula. – lmo Dec 20 '16 at 12:26

2 Answers2

4

You can use lapply to loop over your variables.

fit <- lapply(Data[,c(...)], function(x) lm(x ~ Age + Speed + Gender + Mass, data = Data))

This gives you a list of your results.

The c(...) should contain your variable names as strings. Alternatively, you can choose the variables by their position in Data, like Data[,1:5].

LAP
  • 6,605
  • 2
  • 15
  • 28
4

The problem in your case is that the formula in the lm function attempts to read the literal names of columns in the data or feed the whole vector into the regression. Therefore, to use the column name, you need to tell the formula to interpret the value of the variable varnames and incorporate it with the other variables.

# generate some data
set.seed(123)
Data <- data.frame(x = rnorm(30), y = rnorm(30), 
    Age = sample(0:90, 30), Speed = rnorm(30, 60, 10), 
    Gender = sample(c("W", "M"), 30, rep=T), Mass = rnorm(30))
varnames <- names(Data)[1:2]

# fit regressions for multiple dependent variables 
fit <- lapply(varnames, 
    FUN=function(x) lm(formula(paste(x, "~Age+Speed+Gender+Mass")), data=Data))
names(fit) <- varnames

 fit
$x

Call:
lm(formula = formula(paste(x, "~Age+Speed+Gender+Mass")), data = Data)

Coefficients:
(Intercept)          Age        Speed      GenderW         Mass  
   0.135423     0.010013    -0.010413     0.023480     0.006939  


$y

Call:
lm(formula = formula(paste(x, "~Age+Speed+Gender+Mass")), data = Data)

Coefficients:
(Intercept)          Age        Speed      GenderW         Mass  
   2.232269    -0.008035    -0.027147    -0.044456    -0.023895  
nya
  • 2,138
  • 15
  • 29