What if I want a single linear regression model rather than an "mlm"?

Question

I have shared the top 9 rows of the data I am working on in the image below (y0 to y6 are outputs, rest are inputs):

My objective is to get fitted output data for y0 to y6.

I tried lm function in R using the commands:

lm1 <- lm(cbind(y0, y1, y2, y3, y4, y5, y6) ~ tt + tcb + s + l + b, data = table3)
summary(lm1)

And it has returned 7 sets of coefficients like "Response y0", "Response y1", etc.

What I really want is just 1 set of coefficients which can predict values for outputs y0 to y6.

Could you please help in this?

Welcome to SO! Please read https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example — jogo, Jul 24 '18 at 10:40
Thanks for your input. Basically, I want to frame an equation like: Y = a + (tt x coefficient1) + (s x coefficient2) + (l x coefficient3) + (b x coefficient4) where; a is the intercept. and Y corresponds to the fitted values for y0, y1 till y6. So my requirement is to have the same model coefficients for all the 7 outputs. Any help will be appreciated. — AB9, Jul 24 '18 at 11:02
I am indeed new to the world of Statistics and Modelling! Thanks. — AB9, Jul 24 '18 at 13:11

Zheyuan Li · Accepted Answer · 2018-07-26T21:53:13.153

By cbind(y0, y1, y2, y3, y4, y5, y6) we fit 7 independent models (which is be a better idea).

For what you are looking for, stack your y* variables, replicate other independent variables and do a single regression.

Y <- c(y0, y1, y2, y3, y4, y5, y6)
tt. <- rep(tt, times = 7)
tcb. <- rep(tcb, times = 7)
s. <- rep(s, times = 7)
l. <- rep(l, times = 7)
b. <- rep(b, times = 7)

fit <- lm(Y ~ tt. + tcb. + s. + l. + b.)

Predicted values for y* are

matrix(fitted(fit), ncol = 7)

For other readers than OP

I hereby prepare a tiny reproducible example (with only one covariate x and two replicates y1, y2) to help you digest the issue.

set.seed(0)
dat_wide <- data.frame(x = round(runif(4), 2),
                       y1 = round(runif(4), 2),
                       y2 = round(runif(4), 2))
#     x   y1   y2
#1 0.90 0.91 0.66
#2 0.27 0.20 0.63
#3 0.37 0.90 0.06
#4 0.57 0.94 0.21

## The original "mlm"
fit_mlm <- lm(cbind(y1, y2) ~ x, data = dat_wide)

Instead of doing c(y1, y2) and rep(x, times = 2), I would use the reshape function from R base package stats, as such operation is essentially a "wide" to "long" dataset reshaping.

dat_long <- stats::reshape(dat_wide,  ## wide dataset
                           varying = 2:3,  ## columns 2:3 are replicates
                           v.names = "y",  ## the stacked variable is called "y"
                           direction = "long"  ## reshape to "long" format
                           )
#       x time    y id
#1.1 0.90    1 0.91  1
#2.1 0.27    1 0.20  2
#3.1 0.37    1 0.90  3
#4.1 0.57    1 0.94  4
#1.2 0.90    2 0.66  1
#2.2 0.27    2 0.63  2
#3.2 0.37    2 0.06  3
#4.2 0.57    2 0.21  4

Extra variables time and id are created. The former tells which replicate a case comes from; the latter tells which record that case is within a replicate.

To fit the same model for all replicates, we do

fit1 <- lm(y ~ x, data = dat_long)
#(Intercept)            x  
#     0.2578       0.5801  

matrix(fitted(fit1), ncol = 2)  ## there are two replicates
#          [,1]      [,2]
#[1,] 0.7798257 0.7798257
#[2,] 0.4143822 0.4143822
#[3,] 0.4723891 0.4723891
#[4,] 0.5884029 0.5884029

Don't be surprised that two columns are identical; there is only a single set of regression coefficients for both replicates after all.

If you think carefully, we can do the following instead:

dat_wide$ymean <- rowMeans(dat_wide[2:3])  ## average all replicates
fit2 <- lm(ymean ~ x, data = dat_wide)
#(Intercept)            x  
#     0.2578       0.5801

and we will get the same point estimates. Standard errors and other summary statistics would differ as two models have different sample size.

coef(summary(fit1))
#             Estimate Std. Error   t value  Pr(>|t|)
#(Intercept) 0.2577636  0.2998382 0.8596755 0.4229808
#x           0.5800691  0.5171354 1.1216967 0.3048657

coef(summary(fit2))
#             Estimate Std. Error  t value    Pr(>|t|)
#(Intercept) 0.2577636 0.01385864 18.59949 0.002878193
#x           0.5800691 0.02390220 24.26844 0.001693604

Thanks. the following worked to get all outputs in one y. and then lm. library(reshape2) dat2<-melt(dat1,id.vars = c('tt','tcb','s','l','b')) View(dat2) — AB9, Jul 26 '18 at 16:00

What if I want a single linear regression model rather than an "mlm"?

1 Answers1