compact/efficient replacement for diag(X V X^T)?

Question

When making predictions for a linear statistical model we usually have a model matrix X of predictors corresponding to the points at which we want to make predictions; a vector of coefficients beta; and a variance-covariance matrix V. Computing the predictions is just X %*% beta. The most straightforward way to compute the variances of the predictions is

diag(X %*% V %*% t(X))

or slightly more efficiently

diag(X %*% tcrossprod(V,X))

However, this is very inefficient, because it constructs an n*n matrix when all we really want is the diagonal. I know I could write some Rcpp-loopy thing that would compute just the diagonal terms, but I'm wondering if there is an existing linear algebra trick in R that will nicely do what I want ... (if someone wants to write the Rcpp-loopy thing for me as an answer I wouldn't object, but I'd prefer a pure-R solution)

FWIW predict.lm seems to do something clever by multiplying X by the inverse of the R component of the QR-decomposition of the lm; I'm not sure that's always going to be available, but it might be a good starting point (see here)

score 3 · Accepted Answer · answered Jul 10 '19 at 02:11

Along the lines of this Octave/Matlab question, for two matrices A and B, we can use the use the fact that the nth diagonal entry of AB will be the product of the nth row of A with the nth column of B. We can naively extend that to the case of three matrices, ABC. I have not considered how to optimize in the case where C=A^T, but aside from that, this code looks like promising speedup:

start_time <- Sys.time()

A=matrix(1:1000000, nrow = 1000, ncol = 1000)
B=matrix(1000000:1, nrow = 1000, ncol = 1000)

# Try one of these two
res=diag(A %*% B %*% t(A)) # ~0.47s
res=rowSums(A * t(B %*% t(A))) # ~0.27s

end_time <- Sys.time()

print(end_time - start_time)

Using tcrossprod did not appear to accelerate the results when I ran this code. However, just using the row-sum-dot-product approach appears to be a lot more efficient already, at least on this silly example, which suggests (though I'm not sure) that rowSums is not computing the full intermediate matrices before returning the diagonal entries, as I'd expect happens with diag.

I'm pretty sure this is more computationally efficient. It would also be interesting to think about memory use, but this answer is definitely good enough for a check-mark ... — Ben Bolker, Jul 11 '19 at 07:39

kangaroo_cliff · Answer 2 · 2019-07-10T07:48:08.703

0

I am not quite sure how efficient this is,

Find U such that V = U %*% t(U); this is possible since V is cov matrix.
XU = X %*% U
result = apply(XU, 1, function(x) sum(x^2))

Demo

V <- cov(iris[, -5])
X <- as.matrix(iris[1:5, -5])

Using SVD

svd_v <- svd(V)
U <- svd_v$u %*% diag(sqrt(svd_v$d))
XU = X %*% U
apply(XU, 1, function(x) sum(x^2))
#       1        2        3        4        5 
#41.35342 39.36286 35.42369 38.25584 40.30839

Another approach - this isn't also going to be faster than @davewy's

U <- chol(V)
XU = (X %*% U)^2
rowSums(XU)

edited Jul 10 '19 at 07:48

answered Jul 10 '19 at 02:08

kangaroo_cliff

6,067
3
29
42

2

Time complexity of an SVD algorithm will be a lot more expensive than matrix multiplication (closer to O(n^3) than O(n^2), especially if you're asking for the full SVD), so I'm worried that will have scalability problems. See e.g. https://mathoverflow.net/questions/161252/what-is-the-time-complexity-of-truncated-svd for some discussion. – davewy Jul 10 '19 at 02:19
don't know how Cholesky decomposition compares -- that's probably what I would have done instead of SVD. (This gives more of a clue of what `predict.lm` is doing - since the QR decomposition is already stored in the `lm` object, inverting the (triangular) `R` component is cheap ... – Ben Bolker Jul 10 '19 at 04:30
I agree SVD may not be the best method - I main point is that if there is a method to get `U` fast, then the remaining approach could be useful. – kangaroo_cliff Jul 10 '19 at 05:07

score 0 · Answer 3 · answered Feb 10 '23 at 00:47

I recently found emulator::quad.diag(), which is just

colSums(crossprod(M, Conj(x)) * x)

This is slightly better than @davewy's solution (although the overall differences are less than I thought they would be anyway).

library(microbenchmark)
microbenchmark(full=diag(A %*% B %*% t(A)), 
               davewy=rowSums(A * t(B %*% t(A))), 
               emu = quad.diag(A,B))
Unit: milliseconds
   expr      min       lq     mean   median       uq      max neval cld
   full 32.76241 35.49665 39.51683 37.63958 41.46561 57.41370   100   c
 davewy 22.74787 25.06874 28.42179 26.97330 29.68895 45.38188   100  b 
    emu 17.68390 20.21322 23.59981 22.09324 24.80734 43.60953   100 a

compact/efficient replacement for diag(X V X^T)?

3 Answers3