Task: Fast real-to-complex FFT computation of a large array.
The shape of array a
is (103430 x 1 x 100 x 900) where the dimensions are (time, a dummy dim, longitude, latitude), so let's say (~100000 x 1 x 100 x 900). The FFT should be computed over axes 0,2,3.
Numpy.fft.fftn(a,axes=(0,2,3))
takes too long (~6 hours), therefore I want to use pyfftw. I tried using pyfftw.interfaces.numpy_fft.fftn(a,axes=(0,2,3))
.
Problem: Memory consumption of input array corresponds to approx. 13 % of our machine RAM, thus including the output array it should become ~ 40 % (output is complex). However during calculation the memory usage rises close to 100 % until the command quits with a memory error in PyCharm.
I have created a smaller version of a random number array (10000,1,100,900) with ~1.3 % memory consumption. The memory usage temporarily goes up to ~10.6 % if an FFT is performed only over axis 3 and to ~ 13 % if performed over axes 0,2,3 as mentioned before.
I assume that intermediate array copies cause this high memory usage. I have googled for the pyfftw documentation and tried setting the auto_align_input
and auto_contiguous
options to False as well as overwrite_input
to True, without success. I also tried creating the FFTW object myself and playing around with the params.
MATLAB, also using FFTW, performs the task for (100000,1,100,900) in a few seconds, with the maximum memory consumption being the necessary ~40 %. Apparently, intermediate copies of the array, the likely reason for the additional memory used when running pyfftw, are from an algorithm point of view not necessary, as the MATLAB example shows.
Question: Is there any way to enforce absolutely no additional memory consumption in pyfftw? If so, how? Which parameters?
P.S: Two possible workarounds would be to
- save the array from python to fits, load it into MATLAB, perform the FFT, save it from MATLAB to fits, and load it back into python
- do repeated single-dimension FFTs via for-loops; the memory overhead would then not be relevant and the results can be inserted into an output array as part of the loop
However, I would like to avoid those. There should be a way to perform a single 3D-FFT that does not consume the entire RAM the machine has (512 GB).
Update: I ran the following commands:
a = np.random.rand(10000,1,100,900)
run_fftw = pyfftw.builders.fftn(a, axes=(3,), auto_contiguous=False, auto_align_input=False, avoid_copy=True)
b = run_fftw()
It turns out that the memory being used is actually that of a + b + a complex-broadcasted copy of a inside run_fftw(). This can be reduced to the memory of complex-broadcasted a + b if a is already broadcasted to complex before defining run_fftw or by deleting a after having created run_fftw.
Since the problem is quasi-solved right now (thanks to @HenryGomersall), the only curiosity question now is whether there's a real-to-complex scheme callable via fftn that gives all frequencies, negative ones included, and does not internally broadcast the input array to complex.
I understand though that for this case rfftn could be used. This throws away the (redundant) negative frequencies, though.
I had tried something like:
`a = np.random.rand(10000,1,100,900)`
`run_fftw = pyfftw.builders.fftn(a, axes=(3,), auto_contiguous=False, auto_align=False, avoid_copy=True)`
`b = run_fftw()`
However this takes in total a memory of the a + b + the output (somehow in the FFTW object). The last one should somehow be avoidable. I'll still read a bit further, maybe I'll find something there. About the alignment, this should not be a problem since as you mentioned it could be done beforehand. – bproxauf Aug 09 '17 at 07:45