I have the following dataframe:
Index <- seq.int(1:10)
A <- c(5, 5, 3, 4, 3, 3, 2, 2, 4, 3)
B <- c(10, 11, 12, 12, 12, 11, 13, 13, 14, 13)
C <- c(7, 6, 7, 7, 6, 5, 6, 5, 5, 4)
df <- data.frame(Index, A, B, C)
> df
Index A B C
[1,] 1 5 10 7
[2,] 2 5 11 6
[3,] 3 3 12 7
[4,] 4 4 12 7
[5,] 5 3 12 6
[6,] 6 3 11 5
[7,] 7 2 13 6
[8,] 8 2 13 5
[9,] 9 4 14 5
[10,] 10 3 13 4
I would like to generate linear models (and ultimately obtain slopes, intercepts, and coefficients of determination in an easy-to-work-with dataframe form) with the Index
column as the dependent variable and with all of the other columns as the response variable, separately. I know I can do this by running the following line of code:
summary(lm(cbind(A, B, C) ~ Index, data = df))
One issue I have with the above line of code is that it uses the cbind
function, and thus, I have to input each column separately. I am working with a large dataframe with many columns, and instead of using the cbind
function, I'd love to be able to tell the function to use a bunch of columns (i.e., response variables) at once by writing something like df[, 2:ncol(df)]
in place of cbind(A, B, C)
.
Another issue I have with the above line of code is that the output is not really in a user-friendly form. Ultimately, I would like the output (slopes, intercepts, and coefficients of determination) to be in an easy-to-work-with dataframe form:
response <- c("A", "B", "C")
slope <- c(-0.21818, 0.33333, -0.29091)
intercept <- c(4.60000, 10.26667, 7.40000)
r.squared <- c(0.3776, 0.7106, 0.7273)
summary_df <- data.frame(response, slope, intercept, r.squared)
> summary_df
response slope intercept r.squared
1 A -0.21818 4.60000 0.3776
2 B 0.33333 10.26667 0.7106
3 C -0.29091 7.40000 0.7273
What is the most efficient way to do this? There must be a solution using the lapply
function that I'm just not getting. Thanks so much!