1

I have a CUDA kernel which is using more than 20 seconds in my old Tesla card. I want to kill the CUDA kernel programatically if the kernel is running for more than 20 seconds. My intention is that if the kernel is running for more than 20 seconds, then that kernel should be killed and run another kernel, whose precision is lesser.

My OS is Windows 7 64 bit. CUDA version is 5.0. GPU is Tesla C1060

Please help me in killing the CUDA kernel without exiting the application.

tomix86
  • 1,336
  • 2
  • 18
  • 29
Sijo
  • 619
  • 1
  • 7
  • 25
  • Which part is your question about? The timing or the killing? – Carl Norum Jan 08 '13 at 04:56
  • 1
    I don't think it's possible. Also, I don't think it's necessary. Surely, you can estimate the run time of your kernel based on the problem size and input parameters and select the precision based on that? Or, you can time the kernels and dynamically adjust a switch point as the program learns more about how input parameters affect run time. – Roger Dahl Jan 08 '13 at 05:38
  • @RogerDahl: sounds like you're assuming the OP can solve the halting problem. – leftaroundabout Jan 08 '13 at 17:32
  • @leftaroundabout: Not at all. This is not a question of "if" the kernel will finish, just "when". – Roger Dahl Jan 08 '13 at 17:45
  • @RogerDahl: that doesn't make it any easier. The halting problem says you can't in general determine whether a program halts, except you have already run it and seen that it does. In the same way, you can't in general determine _when_ a program halts in any other way than `time`ing it. Of course very often it _is_ possible because most CUDA kernels actually use only a non-Turing-complete subset of the language, but the question rather suggests this is not the case here. – leftaroundabout Jan 08 '13 at 18:03
  • @leftaroundabout: The halting problem is about determining if a program will finish, not how long it will take. I found now, that there is a related concept, "Worst-case execution time" (WCET). So, you might be able to argue that I was assuming the the OP could solve the WCET problem. But that would also be wrong because, we're not talking general cases, just the same kernel running with different sized input. What do you mean by "non-Turing-complete subset"? Kernels usually contain both conditionals and loops... – Roger Dahl Jan 08 '13 at 22:01
  • 1
    @RogerDahl determining WCET in general is [equivalent to halting problem](http://en.wikipedia.org/wiki/Worst-case_execution_time#Considerations_when_calculating_WCET), but anyway. What I meant by "not Turing complete" is what you probably mean by "same kernel with different sized input": that there aren't any loops that depend on the input in a _nontrivial_ way, e.g. only iterate from 0 to a constant. – But yes, we need the general case here, since we know nothing about what the OP's kernel does. The relation between input size and kernel runtime might not be trivial at all. – leftaroundabout Jan 08 '13 at 22:25
  • @leftaroundabout: Thanks for the info. I see what you mean now. – Roger Dahl Jan 09 '13 at 17:03

1 Answers1

2

You can halt the running of an existing kernel from within the kernel by using an assert (which fails) on a device of CC 2.0 or higher. I don't think this will serve the stated purpose, however, at least not conveniently.

You can also halt operation of a device from the host side using cudaDeviceReset(). I haven't thought through this fully, but it should be possible on the host side to monitor a timer of some sort, and at the completion of the timeout period reset the device, if some indication of the results is not present. This type of reset is a bit of a crowbar, so you will need to completely re-start operations on that device (including cudaMallocs, etc.) in order to re-start your desired operations.

Note that cudaDeviceReset() by itself is insufficient to restore a GPU to proper functional behavior. In order to accomplish that, the "owning" process must also terminate. See here.

Robert Crovella
  • 143,785
  • 11
  • 213
  • 257