-2

I wonder if I can use such as for loop or apply function to do the linear regression in R. I have a data frame containing variables such as crim, rm, ad, wd. I want to do simple linear regression of crim on each of other variable.

Thank you!

lacfo
  • 69
  • 3
  • 12
  • Great question! It would be considerably easier if you could provide a [minimal](http://stackoverflow.com/help/mcve) and [reproducible](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) example for us to quickly play with to give you a suggestion. – r2evans May 19 '16 at 04:44
  • There are too many directions to take with this question. Please edit your question with the response and predictors, and provide a set of reproducible example as well as the desired outcome for the loops. – Adam Quek May 19 '16 at 05:39

2 Answers2

2

If you really want to do this, it's pretty trivial with lapply(), where we use it to "loop" over the other columns of df. A custom function takes each variable in turn as x and fits a model for that covariate.

df <- data.frame(crim = rnorm(20), rm = rnorm(20), ad = rnorm(20), wd = rnorm(20))

mods <- lapply(df[, -1], function(x, dat) lm(crim ~ x, data = dat))

mods is now a list of lm objects. The names of mods contains the names of the covariate used to fit the model. The main negative of this is that all the models are fitted using a variable x. More effort could probably solve this, but I doubt that effort is worth the time.

If you are just selecting models, which may be dubious, there are other ways to achieve this. For example via the leaps package and its regsubsets function:

library("leapls")
a <- regsubsets(crim ~ ., data = df, nvmax = 1, nbest = ncol(df) - 1)
summa <- summary(a)

Then plot(a) will show which of the models is "best", for example.

Original

If I understand what you want (crim is a covariate and the other variables are the responses you want to predict/model using crim), then you don't need a loop. You can do this using a matrix response in a standard lm().

Using some dummy data:

df <- data.frame(crim = rnorm(20), rm = rnorm(20), ad = rnorm(20), wd = rnorm(20))

we create a matrix or multivariate response via cbind(), passing it the three response variables we're interested in. The remaining parts of the call to lm are entirely the same as for a univariate response:

mods <- lm(cbind(rm, ad, wd) ~ crim, data = df)
mods 

> mods

Call:
lm(formula = cbind(rm, ad, wd) ~ crim, data = df)

Coefficients:
             rm        ad        wd      
(Intercept)  -0.12026  -0.47653  -0.26419
crim         -0.26548   0.07145   0.68426

The summary() method produces a standard summary.lm output for each of the responses.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • I am sorry I did not make it clear I want to use crim as response and other variables as predictors. And do a series of simple regression, such as crim~ad, crim~wd, crim~rm – lacfo May 19 '16 at 04:59
  • You can do that by creating a formulas list: `formulas <- list(crim ~ rm, crim ~ ad, crim ~ wd)`. Then use `lapply` to fit the model. – JasonWang May 19 '16 at 06:18
  • @lacfo I've updated the answer to address this clarification. – Gavin Simpson May 19 '16 at 06:20
  • There is also `nlme::lmList` which provides nice syntax and its own `summary` and `coef` methods, but requires reshaping the data. – Roland May 19 '16 at 08:40
1

Suppose you want to have response variable fix as first column of your data frame and you want to run simple linear regression multiple times individually with other variable keeping first variable fix as response variable.

h=iris[,-5]

for (j in 2:ncol(h)){
  assign(paste("a", j, sep = ""),lm(h[,1]~h[,j]))
}

Above is the code which will create multiple list of regression output and store it in a2,a3,....

Ishan Gyan
  • 11
  • 1