1

Huge matrix multiplication can run for hours and consume a lot of memory. There are already some questions on Stackoverflow asking how to reduce the memory consumption, for instance [here]Python NUMPY HUGE Matrices multiplication.

One suggestion is to perform the multiplication row by row, so that only two rows are loaded at the same time which I find pretty straightforward, although very slow. Another suggestion is to use sparse representations and that is where my mathematical knowledge reaches its limits.

Although this is a big topic, I haven't found some simple examples of implementations yet. So I ask this question on the basis of a REPREX. It will probably help some people in the future. In this example, I have a vector and a matrix, which I would like to multiply. After that, I build the sum of every row in the resulting matrix. My example looks like this:

myV = [6.29586100e-05, 5.04149100e-04, 1.44845100e-05]
myM = [[1,0,0],[0,0,1],[0,0,1],[0,0,0],[0,1,1]]
result = myfun(myV, myM)

Whereby myfun is defined as follows:

def myfun(V, M):

    results=[]

    #convert lists into matrices
    npV = np.matrix(V)
    npM = np.matrix(M)

    #mutliply matrices
    results = np.multiply(npV, npM)

    #print for understanding
    print('npV');print(npV)
    print('npM'); print(npM)
    print('result');print(result)

    #sum probabilities of morph given every word
    sum_results = np.sum(results,axis=1).tolist()
    print('sum_results'); print(sum_results)

    #convert numpy array back to list
    list_results = [value for sublist in sum_results for value in sublist]

    return(list_results)

For illustration, I printed what happens:

npV
[[  6.29586100e-05   5.04149100e-04   1.44845100e-05]]

npM
[[1 0 0]
 [0 0 1]
 [0 0 1]
 [0 0 0]
 [0 1 1]]

results
[[  6.29586100e-05   0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   0.00000000e+00   1.44845100e-05]
 [  0.00000000e+00   0.00000000e+00   1.44845100e-05]
 [  0.00000000e+00   0.00000000e+00   0.00000000e+00]
 [  0.00000000e+00   5.04149100e-04   1.44845100e-05]]

sum_results
[[6.295861e-05], [1.448451e-05], [1.448451e-05], [0.0], [0.00051863361]]

However, if this simple calculation is applied on a huge vector and matrix, it consumes a lot of memory. How could the code be improved in order to consume less memory without increasing the run time too drastically?

hyhno01
  • 177
  • 8
  • How much memory is a lot for you? How big is the matrix? Matrix multiplication is probably done using FFT anyway for large enough matrices, and doing it row-by-row or column-by-column may be slower, depending on the memory access pattern of the FFT. A good implementation will be cache-conscious, though. In any case, anything you write directly in Python will be slow (NumPy is quite fast because its kernels are not written in Python). – Kuba hasn't forgotten Monica Feb 28 '20 at 17:03
  • I want to process several files in parallel on several cores. Because of that, the top-limit would be 6 to 8 GB, whereby less is preferable as other users work on the same server and if they consume a lot of memory, the script would break earlier. The vector has length 700, the matrix 700x3000. – hyhno01 Feb 28 '20 at 17:09
  • Cut up the matrices into smaller blocks, and work on pairs of those smaller blocks. IIRC matrix multiplication is recursive: It works even if the elements themselves are matrices, so multiplying a matrix-of-submatrices with matrix-of-submatrices is the same as multiplying two larger matrices, each composed of submatrix blocks. This recursion can be done until (IIRC) 4 submatrices can fit in L1, and then FFT-IFFT multiply will be very fast (may even saturate all execution units in the core) - but you have to benchmark it of course. There should be code that does all that for you, though... – Kuba hasn't forgotten Monica Feb 28 '20 at 18:09
  • Thx Reinstate Monica. Unfortunately, you use a lot of abbreviations I don't know. However, I figured out that FFT and IFFT is matlab terminology, right? Ok, so I'll figure out how to implement that stuff... – hyhno01 Mar 01 '20 at 10:40
  • Sorry about that. FFT and IFFT refer to the Fourier transform, and it can be used to multiply matrices quickly, but it’s best to find a library that does it. L1 refers to the cache in the CPU (Level 1 cache). IIRC is “if I recall correctly”. – Kuba hasn't forgotten Monica Mar 02 '20 at 03:28
  • What is the sparisity of the matrix? In which format (maybe already some sparse format) is the matrix, which dtype? The temporary array is not needed at all... – max9111 Mar 03 '20 at 09:32
  • As can be seen in my example, the matrix (```npM```) tends to be very sparse. It's a numpy matrix. Sidenote: I managed to solve the problem by requesting more memory on our cluster – hyhno01 Mar 03 '20 at 12:38

0 Answers0