Using R to create linear models for multiple variables at once but getting summaries per variable

Question

I have one dataframe that consists of 268 observations and 21 independent variables (screeningq). I have another dataframe (firstweekdata) which includes also 268 observations and various variables but I am interested in only one dependent variable (V474). Each observation (row) includes results for one person. screeningq is a subset from firstweekdata.

I am trying to do a regression analysis where I compare each of the 21 independent variables one by one to the dependent variable I am interestd in. I have been trying to get the linear model summaries but for some reason I cannot manage to get the results in a way that there would be one summary per variable.

The code that I am using is the following:

    nroscreenq<- ncol(screeningq)
    screeninglinearmod <- list()

    par(mar=c(1.5,1,1.5,1),mfrow=c(5,5))
    for (i in 1:nroscreenq) {
      x1 <- screeningq[,i]
      scatter.smooth(x1, y=firstweekdata$V474, main=paste("Question", i), xlab="", cex = 0.5)
      screeninglinearmod[[i]] <- summary(lm(firstweekdata$V474 ~ screeningq[,i]))
    }

I get the following result:

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)  
(Intercept)          36.000     21.313   1.689   0.0940 
screeningq[, i]1     15.000     30.141   0.498   0.6197  
screeningq[, i]100   33.333     22.183   1.503   0.1358  
screeningq[, i]17    35.000     30.141   1.161   0.2480  
screeningq[, i]23    26.000     30.141   0.863   0.3902  
screeningq[, i]25    18.000     30.141   0.597   0.5516  
screeningq[, i]29    15.500     26.103   0.594   0.5538  
screeningq[, i]32    52.000     30.141   1.725   0.0873 
screeningq[, i]35    48.000     30.141   1.593   0.1141  
screeningq[, i]37    27.667     24.610   1.124   0.2633  
screeningq[, i]38    33.500     26.103   1.283   0.2020  
screeningq[, i]44    51.000     30.141   1.692   0.0934 
screeningq[, i]46    -9.000     30.141  -0.299   0.7658  
screeningq[, i]49    41.667     24.610   1.693   0.0932 
screeningq[, i]50    19.667     24.610   0.799   0.4259  
screeningq[, i]51    34.250     23.828   1.437   0.1534  
screeningq[, i]52    13.333     24.610   0.542   0.5890  
screeningq[, i]55    41.000     30.141   1.360   0.1765  
screeningq[, i]56     2.333     24.610   0.095   0.9246  
screeningq[, i]58    20.333     24.610   0.826   0.4104  
screeningq[, i]59    14.667     24.610   0.596   0.5524  
screeningq[, i]60    12.333     24.610   0.501   0.6173  
screeningq[, i]61    39.000     26.103   1.494   0.1380  
screeningq[, i]62    16.667     24.610   0.677   0.4997```

etc. list continues for many more rows

I have tried multiple things but end up with a similar list. What am I doing wrong?

Try posting a *minimal, reproducible example," see https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example Learning to make such an example is a great skill that will help you improve as a programmer, ask better questions etc., so it's definetely worth it! — Emy, Mar 15 '21 at 15:48

Emy · Answer 1 · 2021-03-15T19:40:02.720

I have adapted the code from this suggestion here, using lapply. I think that's the output you are looking for, a series of summaries of single independent variables' output, in a list.

# create a toy dataset with one dependent variable and three dependent variables
DV <- rnorm(20, 10, 3)
IV1 <- rnorm(20, 8, 3)
IV2 <- rnorm(20, 9, 3)
IV3 <- rnorm(20, 9, 3)

df <- data.frame(DV, IV1, IV2, IV3)
cols <- list("IV1", "IV2", "IV3")
forms <- paste('DV ~', cols)
forms

#> [1] "DV ~ IV1" "DV ~ IV2" "DV ~ IV3"

a <- lapply(forms, lm, data = df)
a

#> [[1]]
#> 
#> Call:
#> FUN(formula = X[[i]], data = ..1)
#> 
#> Coefficients:
#> (Intercept)          IV1  
#>     12.1796      -0.3148  
#> 
#> 
#> [[2]]
#> 
#> Call:
#> FUN(formula = X[[i]], data = ..1)
#> 
#> Coefficients:
#> (Intercept)          IV2  
#>   9.8944853   -0.0008378  
#> 
#> 
#> [[3]]
#> 
#> Call:
#> FUN(formula = X[[i]], data = ..1)
#> 
#> Coefficients:
#> (Intercept)          IV3  
#>     11.5488      -0.1798

^{Created on 2021-03-15 by the reprex package (v0.3.0)}

By the way, both the question I linked and this answer are examples of good "minimal, reproducible examples." I have used the R package reprex() to make sure the example was reproducible and to copy/paste it here.

Thank you so much for help! I actually noticed now that the problem was actually in the data type, which was in characters. The original code started to work after I changed the data frame into numerical format. #newbieproblems — KristaK, Mar 16 '21 at 14:15

Using R to create linear models for multiple variables at once but getting summaries per variable

1 Answers1