2

I have the following CUDA kernal, where a computationally expensive calculation is performed and then used in two operations.

Occasionally, I would like to run myKernel without operationOne. I know that code branching is generally a bad idea, but if all threads run the same branch, is there still a substantial inefficiency? i.e. is the following a bad idea?

__global__ void myKernel(bool doOpOne, ...) {
    // usefulValue is computed

    if(doOpOne) {
         // perform operation one
    }
    // perform operation two
}
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257
weemattisnot
  • 889
  • 5
  • 16

1 Answers1

5

It's not necessarily a bad idea. Based on what you have shown and your stipulations, there shouldn't be any significant warp divergence that I can see.

However, you might also consider templating as described here (I believe in fact your question is very nearly a duplicate of that one -- perhaps this one should be marked a duplicate of that one). That will allow you to create kernels for such a simple example (only two options) that are optimized at compile time and therefore will have no branching as a result of the usage of doOpOne.

Community
  • 1
  • 1
Robert Crovella
  • 143,785
  • 11
  • 213
  • 257