2

I am trying to understand the function stats::mahalanobis. Here is its source, but please just focus on last line, or more specifically, the rowSums(x %*% cov * x) part.

> mahalanobis
function (x, center, cov, inverted = FALSE, ...) 
{
    x <- if (is.vector(x)) 
        matrix(x, ncol = length(x))
    else as.matrix(x)
    if (!isFALSE(center)) 
        x <- sweep(x, 2L, center)
    if (!inverted) 
        cov <- solve(cov, ...)
    setNames(rowSums(x %*% cov * x), rownames(x))
}

Here x is a n-by-p matrix, whereas cov is a p-by-p matrix. Their content doesn't matter for the purpose of this question.

According to the document, mahalanobis calculates the squared Mahalanobis distance of all rows in x. I took this as a hint and found a counterpart of rowSums(X %*% C * X) with apply. (it's perfectly fine if you have no idea what I'm talking about; this paragraph merely serves as an explanation of how I came up with the apply form)

> X = matrix(rnorm(1:6), nrow = 3)
> C = matrix(rnorm(1:4), nrow = 2)
> rowSums(X %*% C * X)
[1] -0.03377298  0.49306538 -0.16615078
> apply(X, 1, function(row) {
+     t(row) %*% C %*% row
+ })
[1] -0.03377298  0.49306538 -0.16615078

Now the question becomes why are they equivalent? I suppose one needs to do some clever matrix partition to understand the rationale behind the equivalence, but I'm not enlightened enough to see it.

nalzok
  • 14,965
  • 21
  • 72
  • 139
  • My suggestion is to create a smaller, say 2x2 matrix `X` and do the computations by hand. It should become evident why they are the same. – Rui Barradas Mar 17 '19 at 12:26
  • @RuiBarradas Thanks for your suggestion! This kinda works, but now my brain is filled with all those arithmetic details and cannot get the big idea behind this form. I would prefer an answer from a slightly higher level. – nalzok Mar 17 '19 at 12:43

2 Answers2

5

Just like instead of

sapply(1:5, `*`, 2)
# [1]  2  4  6  8 10

or a loop we prefer

1:5 * 2
# [1]  2  4  6  8 10

as it's a vectorized solution doing exactly the same - element-wise multiplication,

rowSums(X %*% C * X)
# [1] 0.2484329 0.5583787 0.2303054

can be seen to be better than

apply(X, 1, function(row) t(row) %*% C %*% row)
# [1] 0.2484329 0.5583787 0.2303054

with both of them again doing exactly the same, just with the former being more concise.

In particular, in my first example we went from scalars to vectors, and now we go from vectors to matrices. First,

X %*% C
#            [,1]       [,2]
# [1,]  0.7611212  0.6519212
# [2,] -0.4293461  0.6905117
# [3,]  1.2917590 -1.2970376

corresponds to

apply(X, 1, function(row) t(row) %*% C)
#           [,1]       [,2]      [,3]
# [1,] 0.7611212 -0.4293461  1.291759
# [2,] 0.6519212  0.6905117 -1.297038

Now the second product in t(row) %*% C %*% row does two things: 1) element-wise multiplication of t(row) %*% C and row, 2) summing. In the same way, * in X %*% C * X does 1) and rowSums does the summing, 2).

So, in this case there are no significant tricks of changing the order of operations, partitioning, or anything else; it's just taking advantage of existing matrix operations that repeat the same actions with each row/column for us.

Extra:

library(microbenchmark)
microbenchmark(rowSums = rowSums(X %*% C * X),
               apply = apply(X, 1, function(row) t(row) %*% C %*% row),
               times = 100000)
# Unit: microseconds
#     expr    min     lq      mean median     uq        max neval cld
#  rowSums  3.565  4.488  5.995129  5.117  5.589   4940.691 1e+05  a 
#    apply 24.126 26.402 32.539559 27.191 28.615 129234.613 1e+05   b
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • Wait, so the `apply` family can’t take the advantage of vectorization? Thanks for cleaning up a misunderstanding for me! – nalzok Mar 17 '19 at 13:40
  • 1
    @nalzok, the matrix products in `t(row) %*% C %*% row` can be seen as a vectorized version of using manual loops, so in a way your `apply` solution does have some vectorization. However, other than that, it's not really vectorization if we are thinking about it in terms of speed, see, e.g., https://stackoverflow.com/q/28983292/1320535. I also added a performance comparison. – Julius Vainora Mar 17 '19 at 13:45
  • Thanks for the clarification! It would be even better if you could show me how the benchmark is done, because the abbreviations `lq` and `cld` don't make much sense to me? – nalzok Mar 17 '19 at 15:11
  • 1
    @nalzok, added. `lq` and `uq` stand for lower and upper quantiles, I believe. – Julius Vainora Mar 17 '19 at 15:17
3

If A and B are any two conformable matrices and a and b are any two vectors of the same length we will use these facts. The first says the first row of A * B equals the first row of A times the first row of B. The second says that the the first row of A %*% B equals the first row of A all times B. The third says matrix multiplication of two vectors can be expressed as the sum of multiplying them elementwise.

(A * B)[i, ] = A[i, ] * B[i, ]  by the defintion of elementwise multiplication [1]
(A %*% B)[i, ] = A[i, ] %*% B  as taking ith row is same as premultplying by ei [2]
a %*% b = sum(a * b)  by definition of %*% [3]

Thus we get:

rowSums(X %*% C * X)[i]
= sum((X %*% C * X)[i, ])
= sum((X %*% C)[i, ] * X[i, ])  by [1]
= (X %*% C)[i, ] %*% X[i, ] by [3]
= X[i, ] %*% C %*% X[i, ]  by [2]
= apply(X, 1, function(row) t(row) %*% C %*% row)[i]
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341