3

I have a 2d numpy array X = (xrows, xcols) and I want to apply dot product on each row combination of the array to obtain another array which is of the shape P = (xrow, xrow).

The code looks like the following:

P = np.zeros((xrow, xrow))
for i in range(xrow):
   for j in range(xrow):
      P[i, j] = numpy.dot(X[i], X[j])

which works well if the array X is small but takes a lot of time for huge X. Is there any way to make it faster or do it more pythonically so that it is fast?

Abhishek Thakur
  • 16,337
  • 15
  • 66
  • 97

2 Answers2

6

That is obtained by doing result = X.dot(X.T)

When the array becomes large, it can be done be blocks, but depending on your numpy backend this should already parallelize threadwise as much as possible. It seems that this is what you are looking for.

If for some reason you don't want to rely on that, and finally do resort to multiprocessing, you can try something along the lines of

import numpy as np
X = np.random.randn(1000, 100000)
block_size = 10000
from sklearn.externals.joblib import Parallel, delayed
products = Parallel(n_jobs=10)(delayed(np.dot)(X[:, pos:pos + block_size], X.T[pos:pos + block_size]) for pos in range(0, X.shape[1], block_size))
product = np.sum(products, axis=0)

I don't think this is useful for relatively small arrays. And threading can sometimes take care of this better as well.

eickenberg
  • 14,152
  • 1
  • 48
  • 52
  • block size or 10k for total of 1k samples? – Abhishek Thakur Jul 03 '14 at 15:22
  • blocks are in feature space so 10k blocks for 100k features. As I said, I think in most cases X.dot(X.T) largely does the trick. – eickenberg Jul 03 '14 at 15:23
  • what about the memory consumption for `X.dot(X.T)´ – Abhishek Thakur Jul 03 '14 at 15:49
  • It should allocate `np.zeros((X.shape[0],) * 2)` and write into there. It is definitely possible to do this operation without much memory overhead. I cannot tell you how numpy does it, though. Is memory an issue for your usecase? If so, please provide some dimensions. – eickenberg Jul 03 '14 at 15:56
  • You can always extract the iteration from the `Parallel` and write it as a for loop saving directly into a result array (the `zeros` from above) and avoid having all the intermediate matrices in memory. – eickenberg Jul 03 '14 at 16:02
0

This is 10% faster on my machine as it avoids loops:

numpy.matrix(X) * numpy.matrix(X.T)

but still there is 50% redundancy.

Ankush Shah
  • 938
  • 8
  • 13
  • 1
    People usually avoid using `numpy.matrix` and prefer `numpy.dot`. Otherwise I agree with this, because I suggested essentially the same thing ;) – eickenberg Jul 03 '14 at 15:17