I have the following CUDA kernal, where a computationally expensive calculation is performed and then used in two operations.
Occasionally, I would like to run myKernel
without operationOne
. I know that code branching is generally a bad idea, but if all threads run the same branch, is there still a substantial inefficiency? i.e. is the following a bad idea?
__global__ void myKernel(bool doOpOne, ...) {
// usefulValue is computed
if(doOpOne) {
// perform operation one
}
// perform operation two
}