2

Task: Fast real-to-complex FFT computation of a large array.

The shape of array a is (103430 x 1 x 100 x 900) where the dimensions are (time, a dummy dim, longitude, latitude), so let's say (~100000 x 1 x 100 x 900). The FFT should be computed over axes 0,2,3.

Numpy.fft.fftn(a,axes=(0,2,3)) takes too long (~6 hours), therefore I want to use pyfftw. I tried using pyfftw.interfaces.numpy_fft.fftn(a,axes=(0,2,3)).

Problem: Memory consumption of input array corresponds to approx. 13 % of our machine RAM, thus including the output array it should become ~ 40 % (output is complex). However during calculation the memory usage rises close to 100 % until the command quits with a memory error in PyCharm.

I have created a smaller version of a random number array (10000,1,100,900) with ~1.3 % memory consumption. The memory usage temporarily goes up to ~10.6 % if an FFT is performed only over axis 3 and to ~ 13 % if performed over axes 0,2,3 as mentioned before.

I assume that intermediate array copies cause this high memory usage. I have googled for the pyfftw documentation and tried setting the auto_align_input and auto_contiguous options to False as well as overwrite_input to True, without success. I also tried creating the FFTW object myself and playing around with the params.

MATLAB, also using FFTW, performs the task for (100000,1,100,900) in a few seconds, with the maximum memory consumption being the necessary ~40 %. Apparently, intermediate copies of the array, the likely reason for the additional memory used when running pyfftw, are from an algorithm point of view not necessary, as the MATLAB example shows.

Question: Is there any way to enforce absolutely no additional memory consumption in pyfftw? If so, how? Which parameters?


P.S: Two possible workarounds would be to

  1. save the array from python to fits, load it into MATLAB, perform the FFT, save it from MATLAB to fits, and load it back into python
  2. do repeated single-dimension FFTs via for-loops; the memory overhead would then not be relevant and the results can be inserted into an output array as part of the loop

However, I would like to avoid those. There should be a way to perform a single 3D-FFT that does not consume the entire RAM the machine has (512 GB).


Update: I ran the following commands:

a = np.random.rand(10000,1,100,900)
run_fftw = pyfftw.builders.fftn(a, axes=(3,), auto_contiguous=False, auto_align_input=False, avoid_copy=True)
b = run_fftw()

It turns out that the memory being used is actually that of a + b + a complex-broadcasted copy of a inside run_fftw(). This can be reduced to the memory of complex-broadcasted a + b if a is already broadcasted to complex before defining run_fftw or by deleting a after having created run_fftw.

Since the problem is quasi-solved right now (thanks to @HenryGomersall), the only curiosity question now is whether there's a real-to-complex scheme callable via fftn that gives all frequencies, negative ones included, and does not internally broadcast the input array to complex.

I understand though that for this case rfftn could be used. This throws away the (redundant) negative frequencies, though.

bproxauf
  • 1,076
  • 12
  • 23
  • 2
    You can have very tight control over memory usage with pyFFTW. In this situation, it's easiest to use the FFTW object directly, and access the internal arrays directly. Take a look at the [tutorial](http://pyfftw.readthedocs.io/en/latest/source/tutorial.html#the-workhorse-pyfftw-fftw-class) and the [API docs](http://pyfftw.readthedocs.io/en/latest/source/pyfftw/pyfftw.html). It's still worth making sure your alignment is correct, so your arrays can be created with the alignment tools. You can also look at the [MPI]( https://github.com/pyFFTW/pyFFTW/pull/67) WIP. – Henry Gomersall Aug 09 '17 at 07:10
  • @HenryGomersall: thanks for your quick reply:

    I had tried something like:

    `a = np.random.rand(10000,1,100,900)`
    `run_fftw = pyfftw.builders.fftn(a, axes=(3,), auto_contiguous=False, auto_align=False, avoid_copy=True)`
    `b = run_fftw()`
    However this takes in total a memory of the a + b + the output (somehow in the FFTW object). The last one should somehow be avoidable. I'll still read a bit further, maybe I'll find something there. About the alignment, this should not be a problem since as you mentioned it could be done beforehand.
    – bproxauf Aug 09 '17 at 07:45
  • Sorry also for the horrible formatting, I am not familiar with how that works for the comments. – bproxauf Aug 09 '17 at 07:52
  • 2
    You can get the memory requirements down to a single array, using an in-place transform (just set the input and output array to be the same). You'll need to do this manually with the FFTW object. The builders will always create an extra output, so again, if you want precise control over the internal arrays, you'll need to use the FFTW object. Don't be afraid of using the FFTW object, its not that tricky to use, you just need to think a bit about your input and output array shapes and dtypes and alignment. It's all in the docs :) – Henry Gomersall Aug 09 '17 at 09:52
  • Where is the copy being created? Can you show me the line? You shouldn't get extra copies during the run stage that you're not expecting. As far as I know FFTW itself will not create extra arrays. – Henry Gomersall Aug 14 '17 at 14:40

0 Answers0