-2

I'm trying to carry out some loop optimization as described here: Optimizing a Loop vs Code Duplication

I have the additional complication that some code inside the loop only needs to be executed depending on a combination of run-time-known variables external to the loop (which can be replaced with template parameters for optimization, as discussed in the link above) and a run-time-known variable that is only known inside the loop.

Here is the completely un-optimized version of the code:

for (int i = 0; i < 100000, i++){
  if (external_condition_1 || (external_condition_2 && internal_condition[i])){
     run_some_code;
  }
  else{
     run_some_other_code;
  }
  run_lots_of_other_code;
}

This is my attempt at wrapping the loop in a templated function as suggested in the question linked above to optimize performance and avoid code duplication by writing multiple versions of the loop:

template<bool external_condition_1, external_condition_2>myloop(){
for (int i = 0; i < 100000, i++){
  if (external_condition_1 || (external_condition_2 && internal_condition[i]){
     run_some_code;
  }
  else{
     run_some_other_code;
  }
  run_lots_of_other_code;
}

My question is: how can the code be written to avoid branching and code duplication?

Note that the code is sufficiently complex that the function probably can't be inlined, and compiler optimization also likely wouldn't sort this out in general.

DanB
  • 83
  • 6
  • "*compiler optimization also likely wouldn't sort this out in general*" You should probably profile before making such assumptions. – 0x5453 Feb 08 '19 at 21:56
  • Fair enough, but I'm trying to learn how to do this for a general case rather than having to test for multiple uses, compiler settings, etc. – DanB Feb 08 '19 at 21:58
  • I have no idea what is being asked here. A messed up syntax notwithstanding, the code as shown would do what OP wants to do - the only run-time branch remained would be the one on internal_condition. – SergeyA Feb 08 '19 at 22:16
  • @SergeyA There are cases where ideally there should be no run-time branching, i.e. if either `external_condition_1` or `external_condition_2` is false. In my understanding, the code as shown would not handle those cases correctly (i.e. there would still be unnecessary branching). – DanB Feb 08 '19 at 22:29
  • @DanB, no, there will be no run-time branching on template boolean paramenter. – SergeyA Feb 11 '19 at 14:31

2 Answers2

1

My question is: how can the code be written to avoid branching and code duplication?

Well, you already wrote your template to avoid code duplication, right? So let's look at what branching is left. To do this, we should look at each function that is generated from your template (there are four of them). We should also apply the expected compiler optimizations based upon the template parameters.

First up, set condition 1 to true. This should produce two functions that are essentially (using a bit of pseudo-syntax) the following:

myloop<true, bool external_condition_2>() {
    for (int i = 0; i < 100000, i++){
        // if ( true || whatever ) <-- optimized out
        run_some_code;
        run_lots_of_other_code;
    }
}

No branching there. Good. Moving on to the first condition being false and the second condition being true.

myloop<false, true>(){
    for (int i = 0; i < 100000, i++){
        if ( internal_condition[i] ){ // simplified from (false || (true && i_c[i]))
            run_some_code;
        }
        else{
            run_some_other_code;
        }
        run_lots_of_other_code;
    }
}

OK, there is some branching going on here. However, each i needs to be analyzed to see which code should execute. I think there is nothing more that can be done here without more information about internal_condition. I'll give some thoughts on that later, but let's move on to the fourth function for now.

myloop<false, false>() {
    for (int i = 0; i < 100000, i++){
        // if ( false || (false && whatever) ) <-- optimized out
        run_some_other_code;
        run_lots_of_other_code;
    }
}

No branching here. You already have done a good job avoiding branching and code duplication.


OK, let's go back to myloop<false,true>, where there is branching. The branching is largely unavoidable simply because of how your situation is set up. You are going to iterate many times. Some iterations you want to do one thing while other iterations should do another. To get around this, you would need to re-envision your setup so that you can do the same thing each iteration. (The optimization you are working from is based upon doing the same thing each iteration, even though it might be a different thing the next time the loop starts.)

The simplest, yet unlikely, scenario would be where internal_condition[i] is equivalent to something like i < 5000. It would also be convenient if you could do all of the "some code" before any of the "lots of other code". Then you could loop from 0 to 4999, running "some code" each iteration. Then loop from 5000 to 99999, running "other code". Then a third loop to run "lots of other code".

Any solution I can think of would involve adapting your situation to make it more like the unlikely simple scenario. Can you calculate how many times internal_condition[i] is true? Can you iterate that many times and map your (new) loop control variable to the appropriate value of i (the old loop control variable)? (Or maybe the exact value of i is not important?) Then do a second loop to cover the remaining cases? In some scenarios, this might be trivial. In others, far from it.

There might be other tricks that could be done, but they depend on more details about what you are doing, what you need to do, and what you think you need to do but don't really. (It's possible that the required level of detail would overwhelm StackOverflow.) Is the order important? Is the exact value of i important?

In the end, I would opt for profiling the code. Profile the code without code duplication but with branching. Profile the code with minimal branching but with code duplication. Is there a measurable change? If so, think about how you can re-arrange your internal condition so that i can cover large ranges without changing the value of the internal condition. Then divide your loop into smaller pieces.

JaMiT
  • 14,422
  • 4
  • 15
  • 31
0

In C++17, to guaranty no extra branches evaluation, you might do:

template <bool external_condition_1, bool external_condition_2>
void myloop()
{
    for (int i = 0; i < 100000, i++){
        if constexpr (external_condition_1) {
            run_some_code;
        } else if constexpr (external_condition_2){
            if (internal_condition[i]) {
                 run_some_code;
            } else {
                 run_some_other_code;
            }
        } else {
            run_some_other_code;
        }
        run_lots_of_other_code;
    }
}
Jarod42
  • 203,559
  • 14
  • 181
  • 302
  • Thanks, but that does duplicate both `run_some_code` and `run_some_other_code` (these are somewhat lengthy blocks of code rather than function calls). – DanB Feb 08 '19 at 22:04
  • Make them function (or lambda) to factorize code then. – Jarod42 Feb 08 '19 at 22:08
  • Even in C++03 with any optimization turned on boolean template parameters will not produce any run-time switching. constexpr if is not an optimization technique, it is needed for correctness. – SergeyA Feb 08 '19 at 22:15
  • @Jarod42 I avoided showing this to keep things simple, but there are other variables within the loop that would need to passed/returned from functions, so I'd prefer to avoid this. – DanB Feb 08 '19 at 22:20
  • @DustinNieffenegger Regardless, the point of my question is whether this can be done to avoid both code duplication and run-time branching. Simply putting the duplicated code in a function, then duplicating the function call is not the solution I was hoping for. – DanB Feb 08 '19 at 22:59