0

Suppose I have two 2D NumPy arrays A and B, I would like to compute the matrix C whose entries are C[i, j] = f(A[i, :], B[:, j]), where f is some function that takes two 1D arrays and returns a number.

For instance, if def f(x, y): return np.sum(x * y) then I would simply have C = np.dot(A, B). However, for a general function f, are there NumPy/SciPy utilities I could exploit that are more efficient than doing a double for-loop?

For example, take def f(x, y): return np.sum(x != y) / len(x), where x and y are not simply 0/1-bit vectors.

p-value
  • 608
  • 8
  • 22
  • 1
    For that specific case : `np.sum(A[:,None,:] != B.T[None,:,:],axis=2) / A.shape[1]`. It works on a case by case basis. – Divakar Jan 15 '18 at 18:04

1 Answers1

1

Here is a reasonably general approach using broadcasting.

First, reshape your two matrices to be rank-four tensors.

A = A.reshape(A.shape + (1, 1))
B = B.reshape((1, 1) + B.shape)

Second, apply your function element by element without performing any reduction.

C = f(A, B)  # e.g. A != B

Having reshaped your matrices allows numpy to broadcast. The resulting tensor C has shape A.shape + B.shape.

Third, apply any desired reduction by, for example, summing over the indices you want to discard:

C = C.sum(axis=(1, 3)) / C.shape[0]
Till Hoffmann
  • 9,479
  • 6
  • 46
  • 64
  • Thanks for the generality of the approach! I just have two questions: (1) I'm still not sure what the reduction in the last line do? (2) Suppose both `A` and `B.T` are `n`-by-`d` matrices, is the space complexity for the whole method then `O(n^4 d)`? Would this still be practical when `n` is around say, 10^6? – p-value Jan 15 '18 at 19:36
  • (1) Before the reduction, `C` will have shape `(a, b, c, d)` if `A` has shape `(a, b)` and `B` has shape `(c, d)`. So the last step performs the sum you originally mentioned inside the definition of `f`. (2) The space complexity would be `n * n * d * d`. Have a look here for other ideas: https://stackoverflow.com/a/48117449/1150961 – Till Hoffmann Jan 15 '18 at 19:56