using lm() in R for a series of independent fits

Question

I want to use lm() in R to fit a series (actually 93) separate linear regressions. According to the R lm() help manual:

"If response is a matrix a linear model is fitted separately by least-squares to each column of the matrix."

This works fine as long as there are no missing data points in the Y response matrix. When there are missing points, instead of fitting each regression with the available data, every row that has a missing data point in any column is discarded. Is there any way to specify that lm() should fit all of the columns in Y independently and not discard rows where an individual column has a missing data point?

Why not some variation of `sapply(1:93, function(j) lm(y[,j]~x)` — Carl Witthoft, Sep 18 '12 at 16:46

score 4 · Answer 1 · edited May 23 '17 at 11:53

If you are looking to do n regressions between Y1, Y2, ..., Yn and X, you don't specify that with lm() rather you should use R's apply functions:

# create the response matrix and set some random values to NA
values <- runif(50)
values[sample(1:length(values), 10)] <- NA
Y <- data.frame(matrix(values, ncol=5))
colnames(Y) <- paste0("Y", 1:5)
# single regression term
X <- runif(10)

# create regression between each column in Y and X
lms <- lapply(colnames(Y), function(y) {
  form <- paste0(y, " ~ X")
  lm(form, data=Y)
})

# lms is a list of lm objects, can access them via [[]] operator
# or work with it using apply functions once again
sapply(lms, function(x) {
  summary(x)$adj.r.squared
})
#[1] -0.06350560 -0.14319796  0.36319518 -0.16393125  0.04843368

Andy, Thanks for the detailed answer. I tried your approach and it worked great. Bob — Robert DeLeon, Sep 19 '12 at 16:58

using lm() in R for a series of independent fits

1 Answers1