10

Is there convenient way for using asserts within the kernels invocation on device mode?

Johan
  • 74,508
  • 24
  • 191
  • 319
kokosing
  • 5,251
  • 5
  • 37
  • 50

2 Answers2

26

CUDA now has a native assert function. Use assert(...). If its argument is zero, it will stop kernel execution and return an error. (or trigger a breakpoint if in CUDA debugging.)

Make sure to include "assert.h". Also, this requires compute capability 2.x or higher, and is not supported on MacOS. For more details see CUDA C Programming Guide, Section B.16.

The programming guide also includes this example:

#include <assert.h>
__global__ void testAssert(void)
{
   int is_one = 1;
   int should_be_one = 0;
   // This will have no effect
   assert(is_one);
   // This will halt kernel execution
   assert(should_be_one);
}
int main(int argc, char* argv[])
{
   testAssert<<<1,1>>>();
   cudaDeviceSynchronize();
   return 0;
}
Kevin Holt
  • 775
  • 9
  • 15
2
#define MYASSERT(condition) \
  if (!(condition)) { return; }

MYASSERT(condition);

if you need something fancier you can use cuPrintf() which is available from the CUDA site for registered developers.

Mr Fooz
  • 109,094
  • 6
  • 73
  • 101
shoosh
  • 76,898
  • 55
  • 205
  • 325
  • 4
    There shouldn't be a semi-colon at the end of the macro definition - generally the use of the macro will have that semi-colon. Also, consider implementing it like the following to avoid it greedily attaching to any `else` keyword it might happen to immediately precede: `if (condition) /* do nothing */; else return` – Michael Burr Jan 17 '10 at 09:10
  • 2
    If you have a __syncthreads() at any point after this, you should ensure that all threads reach the same decision otherwise you may have deadlock. In addition, you could set a boolean flag (e.g. `bool success` initialised to true by the host) in global memory to indicate the event. It doesn't matter that multiple threads will write `false` to the flag since they are always writing the same value and hence the race is irrelevant. – Tom Jan 17 '10 at 20:09
  • @Tom That is not true. Global memory was not designed for that kind of usage, so the result of several threads concurrently writing into the same global memory position is unexpected behavior. – Auron Feb 29 '12 at 10:26
  • @Auron it is correct to say that if multiple threads write to the same global memory location, which one wins is undefined. However if all threads write the same value then it doesn't matter which wins since the end result is the same. Like I said, it's a race but the race is irrelevant. – Tom Mar 06 '12 at 09:04
  • @Tom This has alrealdy been discussed here: http://stackoverflow.com/questions/5953955/concurrent-writes-in-the-same-global-memory-location – Auron Mar 06 '12 at 14:28
  • 2
    @Auron the linked answer is incomplete, I'll add a comment there. If multiple threads in a warp write to the same location (non-atomic) then which thread performs the final write is undefined, but the location will be updated. This is exactly the same as a data race on any other architecture, which one wins is undefined, whether all values will be written sequentially is undefined, but the location will end up with _one_ of the values. See the CUDA programming guide section 4.1 for more information. – Tom Mar 06 '12 at 16:19
  • @Tom Nice! Thank you very much for clarifying this issue :) – Auron Mar 07 '12 at 11:23