4

I have a critical loop in my code with this shape :

int myloop(int a, .....){

   /* some stuff */

   // Critical loop
   while(...){
       /* Some Stuff */
       if(a == 1){
          // .....
       }
       else if(a == 2){
          // .....
       }
       else if(a == 3){
          // .....
       }
       else{
          // ....
       }
   }
}

As the loop never touches the value of "a" the branch taken will never change, but as this loop is really heavy it will require to test the value of "a" many times, which is totally unnecessary. The best thing is probably to duplicate the loop, so that the "if" can be tested before the loop begins, but this would mean copying a lot of stuff common to both situations and will result in a very ugly code...

Is there any way to ask GCC/G++ to duplicate this code when it compiles it ? Or any other trick to avoid testing the value so many times ?

Thank you for your help !

Nathann

Nathann Cohen
  • 302
  • 1
  • 8
  • 1
    Branch prediction should take care of this pretty well on any modern CPU. Have you profiled the code to see whether there really is a problem ? – Paul R Apr 02 '11 at 16:08
  • 2
    possible duplicate of [Does gcc optimize my cycle with condition?](http://stackoverflow.com/questions/5522913/does-gcc-optimize-my-cycle-with-condition) – Jon Apr 02 '11 at 16:08
  • Did you check your asm code, what gets really generated? GCC can generate jumptables for switches (and I would assume that he treats your if as such). – flolo Apr 02 '11 at 16:11
  • What is the range of the variable "a"? If it is small and fairly continuous, you can accomplish the same logic in your while loop by defining an array of function pointers and using "a" as the index to the array. This will convert all of your conditional tests into a single hash lookup. Also, what type of logic is inside of your conditional tests? If it is similar in nature for each test, there may be a possibility to consolidate it using a different structure as well. – KP Taylor Apr 02 '11 at 16:15

10 Answers10

6

First of all, you can use a switch statement here:

switch(a) {

   case 0:
     // handle a==0
     break;

   case 1:
     // handle a==1
     break;

   default:
     // handle all other cases
}

This may enable the compiler to produce quicker code, i.e. do a single computed jump rather than multiple checks against a.

this would mean copying a lot of stuff common to both situations

Refactor! What about putting the shared code into a separate function, probably declare it inline, and hope that the compiler follows the hint? Function inlining is a good way to let the compiler do code duplication (the other two ways being templates and the preprocessor, both are clearly inappropriate here).

inline void sharedStuff() {...}

int myloop(int a, .....){

   /* some stuff */

   if (a==1) {

      while(...){

         // code specific to a==1

         // do shared stuff
         sharedStuff();
      }

   }
   else if ...
}

Of course it depends on what you do in your loop, but you should get the basic principle.

Last but not least: profile. Check if the loop really is the performance bottleneck. Have a look at the generated machine code. Chances are good that the compiler does already use most of the proposed optimizations.

Alexander Gessler
  • 45,603
  • 7
  • 82
  • 122
2

You can use switch statement:

while(...)
{
   switch(a)
   {
   case 1:
      // do what you want
      break;
   case 2:
      // do what you want
      break;
   case x:
      // do what you want
      break;
   default:
      //if not matching any previous..
   }
}
MByD
  • 135,866
  • 28
  • 264
  • 277
  • It looks like a switch is slower there, as it does not seem to bypass branch prediction which is very efficient in this situation ! – Nathann Cohen Apr 02 '11 at 23:12
  • @Nathann - I think Alexander's answer (the last option) is better than mine if you don't care about space. – MByD Apr 03 '11 at 00:10
2

One common way of doing this is as follows:

// make function implementation inline...
static inline int myloop_inline(int a, .....){

   /* some stuff */

   // Critical loop
   while(...){
       /* Some Stuff */
       if(a == 1){
          // .....
       }
       else if(a == 2){
          // .....
       }
       else if(a == 3){
          // .....
       }
       else{
          // ....
       }
   }
}

// wrapper function which calls myloop_inline with hard-coded values of a
int myloop(int a, .....){
{
    switch (a)
    {

    // expand inline function for all performance-critical values of a...

    case 1:
        myloop_inline(1);
        break;

    case 2:
        myloop_inline(2);
        break;

    case 3:
        myloop_inline(3);
        break;

    ...

    // for any values of a for which the function is not performance-critical
    // we can just use a default case...

    default:
        myloop_inline(a);
        break;

    }
}

Note that because a is passed as a literal constant when myloop_inline() is called form myloop() the compiler can optimise away all irrelevant tests and dead code when it expands the inline function.

You may want to take steps to ensure that myloop_inline() actually gets inlined, this includes compiling with optimisation enabled of course (e.g. -O3), and in the case of e.g. gcc you might want to add __attribute__ ((always_inline)).

Paul R
  • 208,748
  • 37
  • 389
  • 560
  • very interesting. I'll have to check this out later. It's totally dependent on your compiler though so may not be the most portable solution – James Apr 02 '11 at 16:16
  • @James: this works with pretty much any decent optimising compiler - I've certainly used it with gcc, Intel's ICC, IBM's xlc, even MSVC, and a couple of others. – Paul R Apr 02 '11 at 16:17
  • it's my experience with ARM's RVCT compiler that it can completely ignore your desire to inline a function, even if you use __force_inline – James Apr 03 '11 at 14:27
  • @James: if your compiler is just too truculent then as a last resort you can convert the inline function to a macro - it's not pretty but it works... – Paul R Apr 03 '11 at 14:39
1

I'd suggest to pass "a" as a template parameter, ie

template< int a > 
int myloop(.....) {
  if( a==1 ) { ... }
}

Like that it would be optimized away properly.

However, you won't be able to pass a variable as template parameter, so somewhere else you'd have to put that switch.

switch(a) { 
  case 1: myloop<1>(...); break;
  ...
}
Shelwien
  • 2,160
  • 15
  • 17
1

How about defining a function that does whatever happens inside the while loop and define it inline? Then move your while loop inside each if and call the function there. That will do exactly what you want.

AVH
  • 11,349
  • 4
  • 34
  • 43
  • 1
    As a note: just because something is defined as `inline`, doesn't mean it will be inlined. `inline` is just a hint for the compiler. – Femaref Apr 02 '11 at 16:11
1

How about making each a separate function, and then have a pointer to the function?

void func_a() {
    // ...
}

void func_b() {
    // ...
}

int myloop(int a, ...) {
    void (*func)();
    if (a == 1) {
        func = &func_a;
    } else if (a == 2) {
        func = &func_b;
    } ...

    while (1) {
        /* Some stuff */
        func();
    }
}
Gustav Larsson
  • 8,199
  • 3
  • 31
  • 51
0

you could also consider moving the loop inside the conditions something like the following. This sacrifices code size for runtime efficiency.

if (a == 0) {
    while (...) {
       /* some stuff */
       // 0 stuff
    }
} else if (a == 2) {
    while (...) {
       /* some stuff */
       // 2 stuff
    }
} else if (a == 3) {
 ....
}
Will
  • 4,585
  • 1
  • 26
  • 48
0
int myloop(int a, ...){
    if(a == 1){
      myloop1(a, ...);
    }
    else if(a == 2){
      myloop2(a, ...);
    }
    else if(a == 3){
      myloop3(a, ...);
    }
    else{
      myloopelse(a, ...);
    }
}

myloop#(int a, ...){
    SomeStuffAboveLoop(...)
    while(...){
        SomeStuffInsideLoop(...)
        //Do what you want for the appropriate #
    }
}

You could change the if block to a switch as well, which many other answers show.

QuinnG
  • 6,346
  • 2
  • 39
  • 47
0

Are the operations taken under each condition at all similar? In these situations I usually use the if statements to set the value of key objects or pointers to objects and then use those values in the remaining code... basically, maximize your use of indirection.

dim x as object
dim y as string
If (a==1) then
  x=foo.object
  y=bar.string
elseif (a==2)
  x=yuk.object
  y=yo.string
end if
while ...
 dofunction(x,y)
end while

But, if your operations are wildly different, then it's already kind of ugly, and you might as well create multiple loops to save the clock cycles...

RichO
  • 723
  • 4
  • 7
0

You can use an array with pointer to functions, if your conditions are mostly adjacent.

typedef void ( *case_func )( type1 param1, type2 param2 );

void case_func1( type1 param1, type2 param2 );
void case_func2( type1 param1, type2 param2 );
void case_func3( type1 param1, type2 param2 );
void case_func5( type1 param1, type2 param2 );

case_func    func_array[] =
{
    &case_func1,
    &case_func2,
    &case_func3,
    NULL,        // Just a place holder, for non-contiguous values.
    &case_func5
};

int myloop(int a, .....){

   /* some stuff */

   // Critical loop
   while(...)
   {
       if ( func_array[ a ] && a < sizeof( func_array ) )
           func_array[ a ]( .... );
   }
}

If you can be assure that you'll have no holes in your array and a will never be outside array bounds you can omit if ( func_array[ a ] && a < sizeof( func_array ) ), reducing the code to no comparison at all.

Anyway, this approach can be a bit clearer, and may be worth even if it gives no big performance gain.

j4x
  • 3,595
  • 3
  • 33
  • 64