2

I have a situation where I have a rank deficient X matrix and still want regression coefficients. Specifically, I want the well behaved coefficients provided by base::qr

require(Matrix)
y <- c(230, 192, 195, 180, 200, 185, 0)
X <- new("dgCMatrix", i = c(0L, 1L, 2L, 3L, 4L, 5L, 6L, 0L, 1L, 2L, 
         3L, 4L, 5L, 1L, 3L, 5L, 2L, 3L, 4L, 5L, 1L, 0L), p = c(0L, 7L, 
         13L, 16L, 20L, 20L, 21L, 22L), Dim = c(7L, 7L), Dimnames = list(
         c("419", "420", "421", "422", "423", "424", ""), c("", "(Intercept)", 
         "A", "B1", "B2", "B3", "B4")), x = c(0.451764131343143, 
         0.434020584428936, 0.451764131343143, 0.434020584428936, 0.451764131343143, 
         0.451764131343143, 0.788669766125214, 0.788669766125214, 0.757693869580585, 
         0.788669766125214, 0.757693869580585, 0.788669766125214, 0.788669766125214, 
         0.757693869580585, 0.757693869580585, 0.788669766125214, 0.788669766125214, 
         0.757693869580585, 0.788669766125214, 0.788669766125214, 0.757693869580585, 
         0.788669766125214), factors = list())

qrB <- base::qr(X)
qrM <- Matrix::qr(X)
qr.coef(qrB, y)
#                 (Intercept)             A            B1            B2            B3            B4 
# -8.438370e-14  2.916303e+02 -1.441398e+01 -4.120863e+01            NA -2.381583e+01            NA 
qr.coef(qrM, y)
# [1] -9.730332e-01 -1.314143e+16 -1.597833e+01  1.314143e+16  0.000000e+00  1.314143e+16  1.314143e+16
# Warning message:
# In lengths(res@Dimnames) :
#   sparseQR_coef(): structurally rank deficient case: possibly WRONG zeros

So, both recognize that there is no way they can estimate B2 and estimate na NA (base) or zero (Matrix), I'm fine with either result. However, base::qr also recognizes that it needs to drop a second column and picks the final column (it's not clear why, but it works) while Matrix::qr forges on boldly and estimates a number similar to the inverse of .Machine$double.eps.

It's also the case that Matrix is less stable (forgive me, I forget which kind) in that the sum of squares is smaller from base

coef <- qr.coef(qrB, y)
coef[is.na(coef)] <- 0
sum((X %*% coef  - y)^2)
# [1] 15.17039
sum((X %*% qr.coef(qrM, y)  - y)^2)
# [1] 18.58891

Is there a way to get a super fast method like Matrix::qr to work like base::qr and give fast but stable results? I realize this isn't exactly a "large matrix" that's partly because it's a MWE.

Ben Bolker's answer here and two answers here talk about the difference between Lapack and Linpack, but those methods are in base::qr. I'm looking for a way to get modified results from Matrix::qr that give coefficients that I can work with.

pdb
  • 1,574
  • 12
  • 26
  • You can see a workaround in the WeMix package in the qr_0 function https://github.com/American-Institutes-for-Research/WeMix/blob/main/R/analyticSolve.R#L616-L636 – pdb Nov 02 '21 at 21:38

0 Answers0