36

I have 2 very similar kernel functions, in the sense that the code is nearly the same, but with a slight difference. Currently I have 2 options:

  • Write 2 different methods (but very similar ones)
  • Write a single kernel and put the code blocks that differ in an if/else statement

How much will an if statement affect my algorithm performance?
I know that there is no branching, since all threads in all blocks will enter either the if, or the else.
So will a single if statement decrease my performance if the kernel function is called a lot of times?

einpoklum
  • 118,144
  • 57
  • 340
  • 684
lina
  • 1,679
  • 4
  • 21
  • 25

2 Answers2

102

You have a third alternative, which is to use C++ templating and make the variable which is used in the if/switch statement a template parameter. Instantiate each version of the kernel you need, and then you have multiple kernels doing different things with no branch divergence or conditional evaluation to worry about, because the compiler will optimize away the dead code and the branching with it.

Perhaps something like this:

template<int action>
__global__ void kernel()
{
    switch(action) {
       case 1:
       // First code
       break;

       case 2:
       // Second code
       break;
    }
}

template void kernel<1>();
template void kernel<2>();
talonmies
  • 70,661
  • 34
  • 192
  • 269
  • 7
    @talonmies... This answer is the most helpful one I have ever seen on SO. The number of CUDA kernels in my code is reduced about 4 times. :) – sgarizvi Dec 31 '12 at 17:28
  • 2
    templates are very useful for passing thread block size. It makes size of the block static and at the same time adaptive with respect to the specific GPU compute capabilities. (There is no way to use #define directive from C in the same manner.) See matrixmul.cu example. – Kamil Czerski Jul 16 '14 at 10:38
  • This still doesn't work with variables as the template argument, but you can get around that with complex if statements... `kernel<<<1,1>>>(1,2);` doesn't work, but `if (a == 1) { kernel<1><<<1,1>>>(1,2)`} works. – XapaJIaMnu Sep 14 '15 at 11:07
  • @XapaJIaMnu: To the best of my knowledge, template arguments can only be type parameters or constant expressions. What do you mean by "variables as the template arguments"? – talonmies Sep 14 '15 at 11:16
  • @talonmies, whoops I didn't know that. Basically I wanted to use a variable in a template and restrict that template to only fire if the variable is part of the pre instantiated templates and throw a runtime error otherwise. However after reading around seems that this is not possible with C++. In the example from my previous comment, the baviour I want is for the code to produce a runtime exception if `a != 1` and `a != 2`. – XapaJIaMnu Sep 14 '15 at 12:30
  • I need to use `__global__` for the last two lines to make this work: `template __global__ void kernel<1>(); template __global__ void kernel<2>();` – Amir Oct 18 '22 at 14:52
4

It will slightly decrease your performance, especially if it's in an inner loop, since you're wasting an instruction issue slot every so often, but it's not nearly as much as if a warp were divergent.

If it's a big deal, it may be worth moving the condition outside the loop, however. If the warp is truly divergent, though, think about how to remove the branching: e.g., instead of

if (i>0) {
    x = 3;
} else {
    x = y;
}

try

x = ((i>0)*3) | ((i<3)*y);
Thomas Minor
  • 657
  • 3
  • 8