1

I have a large multidimentional matrix where one index actually contain the real and imaginary part of a number.

Here is the code I would like to optimize:

import numpy as np

big_matrix = np.random.random((8,160,23,3,23,80)) # 1240M

tmp1 = np.zeros((8,80,23,3,23,80)) # 620M
tmp2 = np.zeros((8,80,23,3,23,80)) # 620M

for ii in np.arange(80):
  tmp1[:,ii,:,:,:,:] = big_matrix[:,2*ii,:,:,:,:]
  tmp2[:,ii,:,:,:,:] = big_matrix[:,2*ii+1,:,:,:,:]

final_matrix = np.vectorize(complex)(tmp1,tmp2) # 1240M

a = np.sum(final_matrix)

The theoretical memory size for big_matrix should be (8*160*23*3*23*80)*8/(1024**2)=1240MB. So I was expecting a total memory consumption of 3.7 GB. Instead, my memory consumption went up to 11GB. I do not understand why? How can I optimize my program so that it does the same but at a cheaper memory cost?

Thank you,

Sam.

sponce
  • 1,279
  • 2
  • 11
  • 17

1 Answers1

1

As I understand, numpy.vectorize is essentially a Python loop and therefor very inefficient. The high memory consumption that you see is likely caused by it.

The way you're splitting this array is very regular, so just slice it:

tmp1 = big_matrix[:,  ::2, ...]
tmp2 = big_matrix[:, 1::2, ...]

This creates "views" to the original array and as a result doesn't require additional memory.

Looking at the answers here, a simple way to construct the complex array is:

final_matrix = tmp1 + 1j * tmp2

Or more memory efficient:

final_matrix = 1j * tmp2
final_matrix += tmp1

If you're only interested in the overall total, you could also separately sum the real and imaginary parts and combine them in the end.

Community
  • 1
  • 1
  • Thank you! With these modifications I go from 11G to around 2G of memory usage, which is what I wanned ! – sponce Oct 06 '14 at 11:21