There are two standard solutions to your problem:
Streams: cuFFT supports CUDA streams via the cufftSetStream function. The pattern you would want to use is to assosciate each FFT with a separate stream. This may allow you to overlap processing of multiple FFTs. Furthermore, copies to and from the GPU can be overlapped with computation with minimal performance impact.
Batched: As you mention, batching is another solution. If all your FFTs are fairly similar size (as in your example) you should be able to pad smaller ones with data that won't alter/significantly alter the output, so as to make them all the same size. You can process them using a batched call.
I would have thought that in your case streams would be a better solution. This is because it allows you to transfer data to and/or from the device while performing computation, and because you won't suffer from any inefficiencies from having to do additional work on null data.