1

I want to subdivide an image, of size [32,32] for example, to smaller tiles (e.g. [8,8]), and perform a batched 2D FFT on all of the tiles. Is it possible with cuFFT, perhaps using cufftPlanMany() and some combination of istride, idist, and inembed parameters?

The way I see it, I would need to reshape my input image to a size of [8,4,8,4], and then permute the middle two indices for a final shape of [8,8,4*4], and then I could run the standard 2D batched FFT. I could do this with a custom CUDA kernel that would involve copy-pasting, but I was wondering if cuFFT already has this functionality?

Here is my brute force kernel that transforms a [32,32] image into batched [8,8,4*4=16] format:

  // Output tile coordinates
  int x = threadIdx.x;
  int y = threadIdx.y;

  // Output tile size, e.g. [8,8]
  int KX = blockDim.x;
  int KY = blockDim.y;

  // Full dataset size, e.g. [32,32]
  int NX = blockDim.x * gridDim.x;
  int NY = blockDim.y * gridDim.y;
  
  // Batch ID
  int batch = blockIdx.x + blockIdx.y * gridDim.x;

  // Input data coordinates
  int nx = threadIdx.x + blockIdx.x * blockDim.x;
  int ny = threadIdx.y + blockIdx.y * blockDim.y;
  
  // Copy-paste to a batched format, e.g. [8,8,4*4=16]
  output[x + y*KX + batch*KX*KY] = input[nx + ny*NX];

On the other hand, cuFFT provides Advanced Data Layout in the form of

 input[ batch * idist + (y * inembed[1] + x) * istride ]
output[ batch * odist + (y * onembed[1] + x) * ostride ]

Let me try and map it to my kernel above. The best I can do is

output[x + y*KX    + batch*KX*KY] =
 input[x + y*KY*NX + batch*KX + blockIdx.y*(KY-1)*NX]

The problem is that I can't map the batch index without leaving extra terms, either blockIdx.x or blockIdx.y. I think it boils down to the fact that a permutation is required to turn the image into the batched format, which cannot be achieved with just a linear transformation (strides and offsets) available in cuFFT.

As a bonus question, I would also like to have my 2D tiles overlap, similar to this question in 1D: Is it possible to overlap batched FFTs with CUDA's cuFFT library and cufftPlanMany?

  • This should be possible as long as you don't want to do the transforms in-place. See https://docs.nvidia.com/cuda/cufft/index.html?highlight=in%20place#advanced-data-layout – talonmies Apr 13 '23 at 05:30
  • @talonmies I have now made an attempt with the Advanced Data Layout, please see the edited question above – Airidas Korolkovas Apr 13 '23 at 18:37
  • You don’t need to transform the data, just use the advanced data layout on the data as-is – talonmies Apr 13 '23 at 23:27
  • If the distance from the start of one FFT to the start of the next varies over the batch, then I don't think advanced data layout by itself will work. – Robert Crovella Apr 14 '23 at 14:29

0 Answers0