(c++) any way to optimize this loop? can't use function pointer as its inside a class

Question

Trying to optimize the fun_a1() function. The variable j does not change in the scope of fun_a1(). So, checking j==1 or 2 or 3 for each 'i' iteration is obviously a waste of CPU cycles. But if I try to bring the condition evaluation outside the loop, I have to write redundant loops for each condition. In C, I can solve this easily by using a function pointer. However, C++ will not allow pointers to non-static functions. I found a few links describing the mysterious "pointer-to-member". (example 1, example 2) But it's still not clear how do I use it from inside the object itself e.g from inside fun_a()? Or can it be optimized in any other ways?

class A{
    void fun_b(int i);

    void fun_c(int i);

    void fun_d(int i);

    void fun_f(int i);

    void fun_a1(int j){

        for(int i=0; i<1000; i++){

                 if(j==1) fun_b(i);
            else if(j==2) fun_c(i);
            else if(j==3) fun_d(i);

            fun_f(i);           
        }

    }


    void fun_a2(int j){

        if(j==1){           
            for(int i=0; i<1000; i++) { 
                fun_b(i); 
                fun_f(i); 
            }
        }
        else if(j==2){          
            for(int i=0; i<1000; i++) { 
                fun_c(i);
                fun_f(i);
            }            
        }
        else if(j==3){
            for(int i=0; i<1000; i++) { 
                fun_d(i);           
                fun_f(i);
            }           
        }       
    }   
};

You don't. Your compiler should be able to figure this out on its own. For example, Visual C++ 2012 will hoist the function selection out of the loop and generate multiple loops, effectively transforming `fun_a1` into `fun_a2` automatically. — James McNellis, Feb 10 '13 at 04:32
This is almost certainly a case of useless "optimization". The actual function call costs more CPU cycles than the maximum of two "wasted" comparisons. If the compiler inlines the called functions, it gets worse as you totally kill branch prediction with an array of function pointers, in contrast to your current solution. — us2012, Feb 10 '13 at 04:35
He didn't say why he was optimizing it. If I were him, I'd be doing this optimization because I profiled and found `fun_a1` to be a hotspot. In that case, I should be trying to squeeze performance out of the function; if it means I have to use function pointers, then maybe it's worth it for a heavily-used routine. — nneonneo, Feb 10 '13 at 04:37
Next, if you have, say, 20 options for `j` instead of just 3, the compiler might decide not to specialize because the size cost would be too extreme. Most times you let the compiler optimize; sometimes, you have to help. Profiling is how you decide when you have to do the latter. — nneonneo, Feb 10 '13 at 04:39
for common sense it looks a waste of comparison of 'j' on each iteration, where 'j' will never change in that scope! also the actual loop is something like for(int i=0; i — Anon Y., Feb 10 '13 at 04:53
Common sense says that the compiler is much smarter than you might think. — James McNellis, Feb 10 '13 at 04:57

score 7 · Answer 1 · answered Feb 10 '13 at 04:32

7

Here's how you'd use a pointer to member function:

void (A::*fun)(int);
if(j == 1) fun = &A::fun_b;
else if(j == 2) fun = &A::fun_c;
else if(j == 3) fun = &A::fun_d;

for(int i=0; i<1000; i++) {
    (this->*fun)(i);
    fun_f(i);
}

answered Feb 10 '13 at 04:32

nneonneo

171,345
36
312
383

where does this "void (A::*fun)(int);" code go? in the class body or inside the function fun_a()!? – Anon Y. Feb 10 '13 at 04:44
In the function. This whole snippet is the function body. – nneonneo Feb 10 '13 at 04:45
Thanks! Then it is exactly what I was looking for! Just another quick question, will (this->*fun) linger around outside the function scope like a member or member function? or will it just be like a local veritable? It will be a perfect solution if it vanishes at the end of function scope like a local variable. – Anon Y. Feb 10 '13 at 05:00
`void (A::*fun)(int);` is a variable declaration (of member function pointer type `void (A::*)(int)`), so it obeys the usual variable lifetime rules. It does not create a `A::fun` member function. If you put the declaration in the function body (as a local variable), it will go away at the end of the function. `(this->*fun)(i)` is just how you use member function pointers. – nneonneo Feb 10 '13 at 05:04
Calling a function pointer, if it isn't optimized out, is a serious performance hit. – Yakk - Adam Nevraumont Feb 10 '13 at 06:16
@Yakk: Huh? Did you profile it? On some platforms, you can *only* call subroutines through registers, so calling a function pointer is no different. And it is entirely plausible that the regular function call would be a register-based call anyway because of dynamic linking. – nneonneo Feb 10 '13 at 06:17
You cannot unconditionally say it is a performance hit. It is just another option. For truly performance critical code, the only way to be sure is to make several versions of the code and profile. – nneonneo Feb 10 '13 at 15:46
In theory, sure. In practice, indirection is a huge killer of performance. The OP was asking for a way to optimize code, and I would challenge you to find a compiler, target platform, and optimization level where your answer was faster than the OP's "pre-optimization" code. – Yakk - Adam Nevraumont Feb 11 '13 at 00:05

score 1 · Accepted Answer · answered Feb 10 '13 at 06:22

Using function pointers, if the compiler doesn't remove them, is a serious performance hit.

A raw if on an unchanged local variable will be probably optimized out of the loop: that isn't a very fancy optimization.

However, if you want to make this explicit, the answer isn't function or method pointers. It is lambdas and functors.

template<typename Functor>
void fun_a2_internal(Functor f) {
  for(int i = 0; i < 1000; ++i) {
    f(i);
  }
}

void fun_a2(int j) {
  if (j==1)
    fun_a2_internal([&](int i){ fun_b(i); fun_f(i); });
  else if (j==2)
    fun_a2_internal([&](int i){ fun_c(i); fun_f(i); });
  else if (j==3)
    fun_a2_internal([&](int i){ fun_d(i); fun_f(i); });
}

here we write a fun_a2_internal whose job it is to do a loop, and do some task in the loop.

Our fun_a2 passes that task in as a functor via a lambda.

This has the effect that the compiler gets to know the details of the body of the loop when compiling the loop, because the functor's operator() is a non-virtual one, and thus the same for every instance.

In general, if your answer to an optimization problem is "use function pointers" (or member pointers), you have the wrong answer.

This technique is why C++'s std::sort is faster than C's qsort.

I guess you are right. however, would you kindly hint why the use of "pointer to member" is such a performance hit!? Its ok if you cant be bothered. I will try to search it out myself. As for "optimizing", I will rather choose the original fun_a2() over this solution, in this case. — Anon Y., Feb 10 '13 at 07:09
Yes, the original `fun_a2` will almost certainly be best. Function pointers, if the compiler doesn't figure out what you are doing and then ignore what you did with them, are a performance hit because optimization in this era of processors consists of making it easy for the CPU to figure out what is going to run next, so it can pre-fetch and sometimes "pre-execute" the code before we reach it (that, and algorithmic and cache-hit efforts, etc). Eliminating `if`s at the cost of function pointers is ineffective. — Yakk - Adam Nevraumont, Feb 10 '13 at 14:58

score -2 · Answer 3 · answered Feb 10 '13 at 04:32

-2

"optimize" is vague and should never be asked unless

1, there is a particular problem being asked by, say professor based on concepts of efficiency covered
2, there is a particular optimization to be done.

What you are asking is to optimize this "switch" statement

Yes, switch statement is more clean for this situation, but the more efficient way to do this is to make an array of function pointers to void functions containing these loops, then do

doFunction[j]();

for example

typedef void (*myfunc)(void);

myfunc options[10];

main()
{
    options[0] = (void*)loop1;
    options[1] = (void*)loop2;
    options[2] = (void*)loop3;
    options[3] = (void*)loop4;  
}

PS: i++ is slower than ++i unless optimized by compiler.

answered Feb 10 '13 at 04:32

Dmytro

5,068
4
39
50

Maybe but my solution reduces time before a function gets called to O(1). since options[j](); does not multiplex if statements. – Dmytro Feb 10 '13 at 04:37
apologies for use of the word "optimize" in vague. also, C++ wont allow function pointer to non-static member function. I think the answer by 'nneonneo' is the way to go. thanks for your reply though. – Anon Y. Feb 10 '13 at 04:39
if you are multiplexing, you should not be selecting member functions anyway, they should be free static functions in a namespace. I am still pretty confident that they can be function casted without any problems. Whichever way works. – Dmytro Feb 10 '13 at 04:45
If you wrote this code with `typedef void(A::*myfunc)(int);` it would be nicer. – nneonneo Feb 10 '13 at 04:47
I was merely demonstrating the concept, I ignored the class for this question because the core of the question does not really require one, and it's easier to explain by omitting it. also, the loops he has do not require any information. I am not suggesting to multiplex the functions, but to multiplex the loops in the functions. Then if necessary, multiplex the selection of functions used in funa1. – Dmytro Feb 10 '13 at 04:54
An array of function pointers, especially if it is non-constant and declared at global scope, will be a hindrance to inlining. If the optimizer cannot determine with certainty what the target of a function pointer is, it cannot inline the call. There will be no performance difference for `i++` vs. `++i` in this case because `i` is of type `int` (and the value of the expression is unused). – James McNellis Feb 10 '13 at 04:57
i++ and ++i are different in core time efficiency, and i++ could possibly use an extra int variable in it. This difference in miniscule though, if any. Also there is no point talking about inlining because these optimizations are not the same among different compilers, and there is no real loss to grouping functions into an array, other than space efficiency. Also, of course it is a constant array of functions, there is a predefined amount of functions at any given time, why would it need to be dynamic? – Dmytro Feb 10 '13 at 05:00
@Dmitry: "core time efficiency"? My compiler produces the same instruction sequence so there **is no difference**. Your comment verges on nonsensical. – nneonneo Feb 10 '13 at 05:05

(c++) any way to optimize this loop? can't use function pointer as its inside a class

3 Answers3