improving Numpy dot performance by removing arrays copy

Question

Given a matrix QT:

% ipython
Python 2.7.3
In [3]: QT.dtype
Out[3]: dtype('float64')

In [4]: QT.__class__
Out[4]: numpy.ndarray

In [5]: QT.flags
Out[5]:
      C_CONTIGUOUS : True
      F_CONTIGUOUS : False
      OWNDATA : True
      WRITEABLE : True
      ALIGNED : True
      UPDATEIFCOPY : False

I need the results of:

QT.T * QT

Problem: Whenever I try to compute these matrices multiplication, the memory overflows and the code stop running. This happen because of the matrix copy numpy is doing behind.

Tried solutions:

First:

Q = numpy.array(QT.T, order='C')
numpy.dot(Q, QT)

Second:

QT = numpy.array(QT, order='F')
Q = numpy.array(QT.T, order='F')
numpy.dot(Q, QT)

Third:

QT = numpy.matrix(QT)
QT = QT.copy('F')
Q = numpy.matrix(QT.T)
Q = Q.copy('F')
Q.dot(QT)

However, none of them is solving.

Question

How can I operate QT.T * QT without having the memory to explode?

References

http://numpy-discussion.10968.n7.nabble.com/inplace-matrix-multiplication-td21817.html

Is there an "enhanced" numpy/scipy dot method?

Numpy dot product very slow using ints

http://www.scipy.org/PerformanceTips

1) `QT.T * QT` is not the same as `np.dot(QT.T,QT)` for `ndarray` types. 2) What copy are you talking about? `QT.T` should be a view into `QT` so no copy is done there — mgilson, May 21 '13 at 02:02
np.dot is the same as QT.T*QT. On documentation: http://docs.scipy.org/doc/numpy/reference/generated/numpy.dot.html "For 2-D arrays it is equivalent to matrix multiplication" — tfcstack, May 21 '13 at 02:28
About the copies, numpy.dot is coping the matrices paramenters when running the code. On http://www.scipy.org/PerformanceTips: "Although C is only 40 by 40, inspecting the memory usage during the operation of dot will indicate that a copy is being made". — tfcstack, May 21 '13 at 02:29
I believe what you mean to say is that `np.dot(QT.T, QT)` is the code expression for the math expression `QT.T * QT` where `*` is matrix multiply. `QT.T * QT` is also valid python code, but it does not preform a matrix multiply. — Bi Rico, May 21 '13 at 02:59
QT.T * QT is not a valid code for ndarray. Sorry I did not make myself clear. BY QT.T * QT, I mean matrix multiplication. — tfcstack, May 21 '13 at 04:46

score 2 · Answer 1 · answered May 21 '13 at 02:18

2

Have you tried:

shape = (QT.shape[2], QT.shape[2])
result = np.zeros(shape, dtype=QT.dtype)
np.dot(QT.T,  QT, out=result)

Try running the above and see which line, if any, breaks.

answered May 21 '13 at 02:18

Bi Rico

25,283
3
52
75

This line: result = np.zeros(shape, dtype=QT.dtype) was consuming 8GB of RAM before the code crashed by memory overflow. – tfcstack May 21 '13 at 02:37
2

If you're unable to complete the line `result = np.zeros(shape, dtype=QT.dtype)`, it means your system does not have enough memory to hold the result you're asking for. Clearly there is no copying going on in the first 2 lines of code so that cannot be the issue. You might want to look into alternative methods for dealing with large matrices, ie sparse matrices or memory mapped arrays. Hope that helps – Bi Rico May 21 '13 at 03:02
Sparse matrix is impossible, since this Q is a result of SVD, so it is 100% dense. I was looking for somethin kind of inplace. – tfcstack May 21 '13 at 04:45
2

@thalesfc: if you cannot fit into memory a matrix full of zeros that is of the same size as the result of the matrix product would be, what you are trying to do is impossible (and whether Numpy makes copies or not is irrelevant). Either store the matrix on disk (memmaps), or think again what you are actually trying to do --- it sounds like you are trying to approach whatever problem you are trying to solve in a wrong way. – pv. May 21 '13 at 21:02
@thalesfc SVD yields either square or diagonal matices. Your matrix is not square, as then it won't lead to memory errors, so it has to be diagonal. Or am I missing smthg? – alko Jan 07 '14 at 19:54

ali_m · Accepted Answer · 2014-01-07T19:48:42.863

1

If the result won't all fit into core memory, you can put it in a memory-mapped array so that the overflow will be written to your hard disk:

shape = (QT.shape[2],)*2
result = np.memmap('result.dat', dtype=QT.dtype, mode='w+', shape=shape)
np.dot(QT.T, QT, out=result)

You may also want to take a look at this algorithm for performing out-of-core SVD on very large arrays.

edited Jan 07 '14 at 19:48

answered Jan 07 '14 at 19:43

ali_m

71,714
23
223
298

improving Numpy dot performance by removing arrays copy

2 Answers2