How to set cuFFT timeout?

Question

I am looking for a way to interrupt cuda FFT computation if it runs for too long. How can it be accomplished?

I was looking for some timeout setting in the API, but I found no such option. When googling, most hits talk about an unwanted timeout from the GPU driver, which is an entirely different topic.

I am computing rather big 3D FFTs (size is around [1k,1k,0.5k]). Usually it takes a few minutes to complete. However, sometimes it can take hours for some unknown reason. One of more extreme cases I found in logs is:

2023-02-17 21:41:02.174 FFT<C,R> size [960, 1125, 480]
2023-02-18 09:24:46.503 FFT<C,R> complete

In this particular case, the input complex array sits on the GPU, but the output real array is mapped from RAM. Otherwise it would probably not fit the 8GB memory of RTX 3060 Ti that it ran on. Despite the mapping, in most cases few minutes is enough to complete the task.

Nothing like this is possible. The only option is to terminate the owning process. A mechanism like [this](https://stackoverflow.com/questions/56329377/reset-cuda-context-after-exception/56330491#56330491) might possibly be workable if it is important to have some sort of application continuity. CUFFT will sometimes use the output data area as scratchpad, so I'm not surprised that putting the output data area in host pinned memory could result in horrible slowdowns. Does FFTW, or any other FFT library for that matter, have a timeout? — Robert Crovella, Feb 23 '23 at 04:11
@RobertCrovella There is `fftw_set_timelimit(double seconds)`, but from the description it suggests that it affects the planning phase, picking one algorithm over another depending on this constraint. It will not interrupt an algorithm, when it is in the middle of doing something. — CygnusX1, Feb 23 '23 at 05:45
I think your only chance is to fork a subprocess for the FFT and kill that process on timeout. — Homer512, Feb 23 '23 at 08:07
It would have been interesting to see how managed memory with oversubscription holds up performance-wise when using it for both input and output (slower on average, but more consistent maybe?). But oversubsription is also not possible on Windows... — paleonix, Feb 23 '23 at 10:44
@CygnusX1 can you please follow the instructions on this page and open a bug so we can figure out what is happening? https://forums.developer.nvidia.com/t/how-to-report-a-bug/67911 — Anis Ladram, Feb 23 '23 at 19:21
Forking isn't an option but you can still [inherit handles](https://learn.microsoft.com/en-us/windows/win32/procthread/inheritance) which should be enough to refer to the same [anonymous file handle](https://learn.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-createfilemappinga) which can be used to setup shared memory between parent and child. Enough to pass the inputs and outputs unless you want to use stdin and stdout pipes — Homer512, Feb 23 '23 at 21:03
@AnisLadram Updated my CUDA to latest, did some other fixes. If/When the problem resurfaces, I will make a data packet and file a bug as you suggested. — CygnusX1, Feb 27 '23 at 07:45

How to set cuFFT timeout?

0 Answers0