1

I have an N-by-M matrix X, and I need to calculate an N-by-N matrix Y:

Y[i, j] = sum((X[i,] - X[j,]) ^ 2)   0 <= i,j <= N

For now, I have to use nested loops to do it with O(n2). I would like to know if there's a better way, like using matrix operations.

more generally, sum(....) can be a function, fun(x1,x 2) of which x1, x2 are M-by-1 vectors.

Amro
  • 123,847
  • 25
  • 243
  • 454
Fivesheep
  • 236
  • 2
  • 7

3 Answers3

2
(x[i]-x[j])^2 = x[i]² - 2*x[i]*x[j] + x[j]²

and than is middle part just matrix multiplication -2*X*tran(X) (matrix) and other parts are just vetrors and you have to run this over each element

This has O(n^2.7) or whatever matrix multiplication complexity is

Pseudocode:

vec=sum(X,rows).^2
Y=X * tran(X) * -2 
for index [i,j] in Y:
   Y[i,j] =  Y[i,j] + vec[i]+vec[y]
Luka Rahne
  • 10,336
  • 3
  • 34
  • 56
2

you can use expand.grid to get a data.frame of possible pairs:

X <- matrix(sample(1:5, 50, replace=TRUE), nrow=10)

row.ind <- expand.grid(1:dim(X)[1], 1:dim(X)[2])

Then apply along each pair using a function:

myfun <- function(n) {
  sum((X[row.ind[n, 1],] - X[row.ind[n, 2],])^2)
}

Y <- matrix(unlist(lapply(1:nrow(row.ind), myfun)), byrow=TRUE, nrow=nrow(X))


> Y
      [,1] [,2] [,3] [,4] [,5]
 [1,]    0   28   15   31   41
 [2,]   31   28   33   30   33
 [3,]   28    0   15    7   19
 [4,]   33   30   19   34   11
 [5,]   15   15    0   12   22
 [6,]   10   19   10   21   20
 [7,]   31    7   12    0    4
 [8,]   16   17   16   13    2
 [9,]   41   19   22    4    0
[10,]   14   11   28    9    2
> 

I bet there is a better way but its Friday and I'm tired!

Justin
  • 42,475
  • 9
  • 93
  • 111
  • @ralu I wans't thinking about complexity, but you can substitute whatever you'd like in the function to simplify the matrix math, but this avoids a looping construct and is fairly straight forward to multi-thread w/ `mclapply` if desired. And, like I said, its Friday! – Justin Jun 15 '12 at 23:39
  • Maybe that better way is to use `dist()` since we're just computing the Euclidean distance between pairs of rows. This would be the R equivalent of Ansari's "cheating" Matlab answer. – joran Jun 16 '12 at 02:17
  • But the OP asked for a general solution where the `sum(...)` piece could be anything. – Justin Jun 16 '12 at 02:29
  • yes, a general solution is what I wanted. I am coding a SVM function, the matrix is the Dmat for quadprog in R, and the the sum... piece is the kernel function. – Fivesheep Jun 16 '12 at 05:01
1

In MATLAB, for your specific f, you could just do this:

Y = pdist(X).^2;

For a non-"cheating" version, try something like this (MATLAB):

[N, M] = size(X);
f = @(u, v) sum((u-v).^2);
helpf = @(i, j) f(X(i, :), X(j, :))
Y = arrayfun(helpf, meshgrid(1:N, 1:N), meshgrid(1:N, 1:N)');

There are more efficient ways of doing it with the specific function sum(...) but your question said you wanted a general way for a general function f. In general this operation will be O(n^2) times the complexity of each vector pair operation because that's how many operations need to be done. If f is of a special form, some calculations' results can be reused.

Ansari
  • 8,168
  • 2
  • 23
  • 34