3

Is there an efficient way to use FFTW / CUFFT (they have similar APIs) to perform an fft over a given axis of a multi-dimensional array?

Let's say I have a 3D array of shape (2, 3, 4). The strides are (12, 4, 1), meaning that in order to move one unit along the last axis, we move by 1 unit in the flat array, while to move along the first axis by one unit, we have to step over 3 * 4 = 12 units. (The array is a numpy ndarray which can also have other strides when axes are transposed, but I'd be happy with an answer that addresses just this particular 3D case with the given strides)

Now let's say I want to compute a 1D fft along the middle axis. CUFFT exposes the following function:

cufftResult cufftPlanMany(
    cufftHandle *plan,        // Plan to be initialized
    int rank,                 // Rank = 1 for 1D fft
    int *n,                   // shape of the fft = 3
    int *inembed,
    int istride,
    int idist,
    int *onembed,
    int ostride,
    int odist,
    cufftType type,           // e.g. 64 bit float to 128 bit complex
    int batch                 // Could use batch = 2 for the first axis
);

I think we need the nembed, stride, dist parameters to do the transform. They are documented here: http://docs.nvidia.com/cuda/cufft/index.html#advanced-data-layout

The dumentation states that for a 1D fft, the element in batch b at position x will be taken from: input[b * idist + x * istride]

However, element at position [b][x][z] is stored in:

input[b * 12 + x * 4 + z]

So it's not clear how to make CUFFT loop over the third (z) axis.

If I set:

  • idist and odist to 3*4=12 (so that incrementing b moves us along the first axis) and,
  • istride and ostride to 4 (so that incrementing x moves along the second axis, which is the axis along which we want to fft),
  • batch = 2
  • inembed and onembed to 3 (but according to the documentation these are ignored for a 1D transform)

then it computes the correct fft for each of 2 batches where the last axis index is 0, but leaves the sub-arrays for which the last index is 1, 2, or 3 untouched.

This seems like a common use-case, but I can't seem to figure out how to do this with the given parameters without making multiple calls (which is expensive on the GPU) or making a copy with different memory layout.

John von N.
  • 195
  • 1
  • 1
  • 6
  • You are correct that advanced data layout is designed to handle this. If you read the documentation at the link you provided, you'll see that the various stride parameters are 3-dimensional for a 3D transform. It's not clear what you're having trouble with. – Robert Crovella Oct 07 '15 at 16:01
  • What I want to do is a 1D fft along one axis of a 3D array, not a 3D fft. Unless I misunderstood, that means the n and nembed parameters are length 1. What would be the settings for n, nembed, stride and dist to do a 1D fft over the middle axis of a 3D array? Thanks! – John von N. Oct 07 '15 at 18:08
  • Then it's a 1D fft with a stride. Have you tried any settings? – Robert Crovella Oct 07 '15 at 18:20
  • I've made some edits to explain what I tried – John von N. Oct 07 '15 at 19:00
  • So you have figured out how to compute a 1D fft in a 3D data set along the dimension you want, and you're now asking how to do a batch of these? – Robert Crovella Oct 07 '15 at 19:06
  • Sort of. I'm already using the batch parameter to loop over the first axis of the array, but I can't seem to figure out how to also make cufft loop over the last (third) axis. If the array is indexed by [y][x][z], then I want to fft over the x axis for every combination of y=0,1 and z=0,1,2,3. I figured how to do an fft over x for every y=0,1 but z=0 only (using the settings listed in the question). – John von N. Oct 07 '15 at 19:11
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/91654/discussion-between-john-von-n-and-robert-crovella). – John von N. Oct 07 '15 at 19:14
  • 2
    I agree that the middle axis case [will be more difficult](https://software.intel.com/en-us/comment/1806208). I think it will probably require a loop, but each loop iteration could perform a batch of y-transforms in the z-dimension, so the loop extent would only have to be the size of the x dimension. The batch count would be the size of the z-dimension, for a set of dimensions `[z][y][x]`. – Robert Crovella Oct 07 '15 at 21:05

0 Answers0