2

Is there any way to fire an event (for benchmarking purposes, similar to cudaEvents in the CPU code) from a device kernel in CUDA?

E.g. suppose I would like to measure the time passed from kernel start to the first thread ever that starts a computation and the time passed from the last thread that leaves the computation to the CPU return.

Can I do that?

einpoklum
  • 118,144
  • 57
  • 340
  • 684
AGer
  • 31
  • 3

2 Answers2

2

The device runtime API (used with dynamic parallelism) does have limited stream and events support, but event timing is not supported.

So, no you can't do that.

talonmies
  • 70,661
  • 34
  • 192
  • 269
1

An ugly workaround would be writing to some managed-memory location, and having a host-side thread poll it and fire the event when the value changes.

einpoklum
  • 118,144
  • 57
  • 340
  • 684