3

I have found this very useful post and I`d like to clarify something about the compiler optimizations. Lets say we have this function (same like in the original post):

template<int action>
__global__ void kernel()
{
    switch(action) {
       case 1:
       // First code
       break;

       case 2:
       // Second code
       break;
    }
}

Would the compiler do the optimization in the sense of eliminating an unreachable code even in the case I called the function with template variable unknown in the time of compiling - something like creating two separete functions? E.g.:

kernel<argv[1][0]>();
Community
  • 1
  • 1
stuhlo
  • 1,479
  • 9
  • 17

2 Answers2

4

Short answer: no.

Templates are instantiated and generated purely at compiletime, so you can't use the values in argv, since they are not known at compile time.

Makes me wonder why you did not just give it a try and threw that code at a compiler - it would have told you that template arguments must be compile time constants.

Update: Since you told us in the comments that it's not primarily about performance, but about readability, i'd recommend using switch/case:

template <char c> void kernel() {
  //...
  switch(c) { /* ... */ }
}

switch (argv[1][0]) {
  case 'a': 
    kernel<'a'>();
    break;
  case 'b': 
    kernel<'b'>();
    break;
  //...
}

Since the value you have to make the descision on (i.e. argv[1][0]), is only known at runtime, you have to use runtime descision mechanisms. Of those, switch/case is among the fastest, especially if there are not too many different cases (but more than two) and especially if there are no gaps between the cases (i.e. 'a', 'b', 'c', instead of 1, 55, 2048). The compiler then can produce very fast jumptables.

Arne Mertz
  • 24,171
  • 3
  • 51
  • 90
  • I was thinking about reorganizing my code and I wanted to know if this way would cause any decrease of performance. But thank you for the answer even to the dummy question. – stuhlo Feb 28 '13 at 15:15
  • In most cases there's only one who can tell you reliably about an actual performance impact of code changes: The profiler. – Arne Mertz Feb 28 '13 at 15:25
0

Being new to templates I`d had to study some essential matters. Finally I came up with the solution to my problem. If I want to call functions with template parameters depending on command line arguments I should do it like this:

if(argv[1][0] == '1')
    kernel<1><<< ... >>>();

if(argv[1][0] == '2')
    kernel<2><<< ... >>>();

I also checked ptx file of such program and found out that compiler makes in this case optimization producing two different kernel functions without switch statement.

stuhlo
  • 1,479
  • 9
  • 17
  • 1
    To avoid repeated if-statements you can put function pointers to the various template instantiations into an array, then simply invoke the desired kernel via the corresponding function pointer in the array, indexed by the appropriate argv[]. – njuffa Feb 28 '13 at 21:49
  • @njuffa: Yes that looks smartly. It was just an example, `switch` is another way as well, but I like your approach. – stuhlo Feb 28 '13 at 21:59
  • Using switch/case is by far more readable that storing function pointers in arrays and afaik is unlikely to produce more overhead - but again, I'd leave the decision to the profiler, i.e. implement both versions and see which one is faster. That is of course, only if performance really matters in that corner of the program. If it does not, save you some time and effort and just implement it the way you find most readable and maintainable. – Arne Mertz Feb 28 '13 at 22:16
  • @ArneMertz To make my code more readable and maintainable is exactly what I wanted to do and I was curious if this refinement would cause any decrease of performance. PTX output of nvcc has revealed me 'no'. – stuhlo Mar 01 '13 at 10:11
  • Ok, from the question I assumed that performance was you primary concern - I updated my answer. – Arne Mertz Mar 01 '13 at 10:25
  • @ArneMertz I specially concern performance of GPU code (kernel function) because it can be called many times so I wanted to know if compiler would make optimizations in this function even in the case I call this function with template parameters depending on command line arguments. Considering the methods, I am able to do so, mentioned above by me, you and njuffa, it comes very natural to me now that compiler does this optimizations, but I wasn't aware of it before. – stuhlo Mar 01 '13 at 11:09