R loop for simple regression

Question

I would like to create a function who can work with any data frame, with a minimum number of columns (1) and maximum number of columns (n). The function has to do a simple linear regression for each of the independent variables. I know that I have to use the loop for (.), but I don't know how to use it. I try this, but it doesn't work:

>data1<-read.csv(file.choose(),header=TRUE,sep=",")
>n<-nrow(data1)
>PredictorVariables <- paste("x", 1:n, sep="")
>Formula <-paste("y ~ ", PredictorVariables, collapse=" + ",data=data1)
>lm(Formula, data=data1)

why the `javascript`, `android`, `python` and `iphone` tags? — SymbolixAU, Jan 29 '18 at 00:28
If you want to use everything except `y` as a predictor, you can do `lm(y ~ ., data = data1)`. — Marius, Jan 29 '18 at 00:29
i use javascript, android... because i had probleme with tags to publish my question. — jean-philippe, Jan 29 '18 at 00:31
the lm(y ~ ., data = data1) doesent change anything i still havethe same probleme — jean-philippe, Jan 29 '18 at 00:32
Is `y` the column name of the dependent variable in `data1`? I assumed it was because of the example code you showed. Replace `y` with the appropriate column name and try again. — Marius, Jan 29 '18 at 00:40
no Y its not the column name in the data1, and i need to create something who can work whith any dataframe and any name of column. — jean-philippe, Jan 29 '18 at 00:44
Please read [How to Create a Minimal, Complete, and Verifiable Example](https://stackoverflow.com/help/mcve) and update your post. — Len Greski, Jan 29 '18 at 03:55
@jean-philippe Did you take a look at my solution below? The function `myfit` should do what you're after. — Maurits Evers, Jan 29 '18 at 11:44

Len Greski · Answer 1 · 2021-09-06T17:58:02.417

Here is an approach with lapply(), using the mtcars data set. We will selectmpg as the dependent variable, extract the remaining columns from the data set, and then use lapply() to run regression models on each element in the indepVars vector. The output from each model is saved to a list, including the name of the independent variable as well as the resulting model object.

indepVars <- names(mtcars)[!(names(mtcars) %in% "mpg")]

modelList <- lapply(indepVars,function(x){
     result <- lm(mpg ~ mtcars[[x]],data=mtcars)
     list(variable=x,model=result) 
})

# print the first model
modelList[[1]]$variable
summary(modelList[[1]]$model)

The extract operator [[ can then be used to print the content of any of the models.

...and the output:

> # print the first model
> modelList[[1]]$variable
[1] "cyl"
> summary(modelList[[1]]$model)

Call:
lm(formula = mpg ~ mtcars[[x]], data = mtcars)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.9814 -2.1185  0.2217  1.0717  7.5186 
Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  37.8846     2.0738   18.27  < 2e-16 ***
mtcars[[x]]  -2.8758     0.3224   -8.92 6.11e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared:  0.7262,    Adjusted R-squared:  0.7171 
F-statistic: 79.56 on 1 and 30 DF,  p-value: 6.113e-10

>

Responding to the comment from the original poster, here is the code necessary to encapsulate the above process within an R function. The function regList() takes a data frame name and a dependent variable string, and then proceeds to run regressions of the dependent variable on each of the remaining variables in the data frame passed to the function.

regList <- function(dataframe,depVar) {
     indepVars <- names(dataframe)[!(names(dataframe) %in% depVar)]
     
     modelList <- lapply(indepVars,function(x){
          message("x is: ",x)
          result <- lm(dataframe[[depVar]] ~ dataframe[[x]],data=dataframe)
          list(variable=x,model=result) 
     })
     modelList
}

modelList <- regList(mtcars,"mpg")
# print the first model
modelList[[1]]$variable
summary(modelList[[1]]$model)

One can extract a variety of content from the individual model objects. The output is as follows:

> modelList <- regList(mtcars,"mpg")
> # print the first model
> modelList[[1]]$variable
[1] "cyl"
> summary(modelList[[1]]$model)

Call:
lm(formula = dataframe[[depVar]] ~ dataframe[[x]], data = dataframe)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.9814 -2.1185  0.2217  1.0717  7.5186 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)     37.8846     2.0738   18.27  < 2e-16 ***
dataframe[[x]]  -2.8758     0.3224   -8.92 6.11e-10 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 3.206 on 30 degrees of freedom
Multiple R-squared:  0.7262,    Adjusted R-squared:  0.7171 
F-statistic: 79.56 on 1 and 30 DF,  p-value: 6.113e-10

>

but I need a function that receives the following arguments: A dataframe, the column number of the response variable, the number of the minimum explanatory variable column, the number of the maximum explanatory variable column. for example: function1 (df, 1, 2,10) 1 is the column of the response variable, the explanatory variables are located on columns from 2 to 10 inclusively — jean-philippe, Jan 29 '18 at 02:36
The function makes a simple linear regression for the response variable and the set of explanatory variables of the dataframe individually (y ~ x1, y ~ x2, ... etc) It returns the diagnostic charts for each of these regressions (2 rows, 2columns). The function must be applicable regardless of the data frame submitted as a generalizable argument to all dataframes — jean-philippe, Jan 29 '18 at 02:36
@jean-philippe - the additional context in your comments above should have been included in your question, along with a [Minimal, Complete, and Verifiable Example](https://stackoverflow.com/help/mcve). That said, I updated my answer to include an R function that allows one to specify a data frame name and a dependent variable name, rather than column numbers. The answer can easily be tweaked to use column numbers. — Len Greski, Jan 29 '18 at 03:53
sorry, i am just beginner on R and its too difficult for me. your help is really appreciated. — jean-philippe, Jan 29 '18 at 04:07
i need something more general using column numbers to work with any name of the independent or dépendante variable in any data frame. — jean-philippe, Jan 29 '18 at 04:10
@jean-philippe - the function I posted, `regList()`, is already generalized. One passes a data frame name and dependent variable name as arguments to the function, and the function generates linear models for all other variables in the data frame. I simply used `mtcars` as an example, since you didn't post a verifiable example in your question. — Len Greski, Feb 03 '18 at 13:46

score 0 · Answer 2 · answered Jan 29 '18 at 03:10

How about the following:

First, I create some sample data:

# Sample data
set.seed(2017);
x <- sapply(1:10, function(x) x * seq(1:100) + rnorm(100));
df <- data.frame(Y = rowSums(x), x);

Next I define a custom function:

# Custom function where
#  df is the source dataframe
#  idx.y is the column index of the response variable in df
#  idx.x.min is the column index of the first explanatory variable
#  idx.x.max is the column index of the last explanatory variable
# The function returns a list of lm objects
myfit <- function(df, idx.y, idx.x.min, idx.x.max) {
    stopifnot(idx.x.min < idx.x.max, idx.x.max <= ncol(df));
    res <- list();
    for (i in idx.x.min:idx.x.max) {
        res[[length(res) + 1]] <- lm(df[, idx.y] ~ df[, i]);
    }
    return(res);
}

Then I run myfit using the sample data.

lst <- myfit(df, 1, 2, 11);

The return object lst is a list of 11-2+1 = 10 fit results of class lm. For example,

lst[[1]];
#
#Call:
#lm(formula = df[, idx.y] ~ df[, i])
#
#Coefficients:
#(Intercept)      df[, i]
#     -5.121       55.100

PS

For future posts I recommend having a look at how to ask good questions here on SO, and providing a minimal reproducible example/attempt, including sample data.

R loop for simple regression

2 Answers2

PS