0

Ciao, I have several columns that represents scores. I want to estimate models where each SCORE is a function of STUDYTIME. So I want to run as many models as there are SCORE columns all simple models that are functions of STUDYTIME. Then I want to store the coefficients of STUDYTIME in a new column that has rownames equal to the SCORE column name. And last of all I am not sure of how to do clustering on the linear models because STUDENTS are each in the data two times.

Here is my replicating example. This is the data I have now:

df <- data.frame(replicate(5, rnorm(10)))
df[1]<-c(1,1,2,2,3,3,4,4,5,5)
colnames(df) <- c('student','studytime', 'score1','score2','score3')

This is my attempt at the coding:

for (i in 1:nrow(df)) {
  dfx         <- df[,i]
  lm    <- lm(dfx[,3:5] ~ study_time)
  resdat[,i] = summary(lm)$coefficients[2]
}
bvowe
  • 3,004
  • 3
  • 16
  • 33

1 Answers1

1

You can do this using simply lapply and sapply function.

Here is the r code:

Generating Data

df <- data.frame(replicate(5, rnorm(10)))
df[1]<-c(1,1,2,2,3,3,4,4,5,5)
colnames(df) <- c('student','studytime', 'score1','score2','score3')

Storing Results

Results <- lapply(df[, -c(1,2)], FUN = function(x) lm(x ~ df$studytime))
Coef <- sapply(Results, FUN = coefficients)
Neeraj
  • 1,166
  • 9
  • 21
  • 2
    Neeraj: this is actually backwards. I want to regress the SCORE variables on studytime. – bvowe Sep 13 '18 at 18:11
  • Just interchange dependent and independent variable. – Neeraj Sep 13 '18 at 18:25
  • This is excellent @Neeraj. Lastly what if there is a need to store the p-value and residual. I did try lm$residual yet this did not work – bvowe Sep 13 '18 at 18:53
  • Result stores the output of each regression in list. Just subset what you need using `lapply` or `sapply` function. – Neeraj Sep 14 '18 at 05:13