1

I want to use lm() in R to fit a series (actually 93) separate linear regressions. According to the R lm() help manual:

"If response is a matrix a linear model is fitted separately by least-squares to each column of the matrix."

This works fine as long as there are no missing data points in the Y response matrix. When there are missing points, instead of fitting each regression with the available data, every row that has a missing data point in any column is discarded. Is there any way to specify that lm() should fit all of the columns in Y independently and not discard rows where an individual column has a missing data point?

user1317221_G
  • 15,087
  • 3
  • 52
  • 78
Robert DeLeon
  • 11
  • 1
  • 2

1 Answers1

4

If you are looking to do n regressions between Y1, Y2, ..., Yn and X, you don't specify that with lm() rather you should use R's apply functions:

# create the response matrix and set some random values to NA
values <- runif(50)
values[sample(1:length(values), 10)] <- NA
Y <- data.frame(matrix(values, ncol=5))
colnames(Y) <- paste0("Y", 1:5)
# single regression term
X <- runif(10)

# create regression between each column in Y and X
lms <- lapply(colnames(Y), function(y) {
  form <- paste0(y, " ~ X")
  lm(form, data=Y)
})

# lms is a list of lm objects, can access them via [[]] operator
# or work with it using apply functions once again
sapply(lms, function(x) {
  summary(x)$adj.r.squared
})
#[1] -0.06350560 -0.14319796  0.36319518 -0.16393125  0.04843368
Community
  • 1
  • 1
Andy
  • 4,549
  • 31
  • 26