0

How to enable user to define "behavior mode" that a template function doIt(a,b) will from another user-define variable/function op AND optimize the if-else away?

Here is a working MCVE.
My VS solution has 2 projects. pj1 is my library. pj2 is user's project.

A_pj1.h

#pragma once
template<int T>class BoolT{public:
    static bool op;
};
template<int T> bool BoolT<T>::op=true; //by default,  true=+, false=-
template<int i> int doIt(int a,int b){
    if(BoolT<i>::op){
        return a+b;
    }else{ return a-b;}
}

A_pj2_UserDefine.h

#pragma once
#include "A_pj1.h"
inline void A_pj2_UserDefine_Reg(){
    BoolT<2>::op=false; //override default value;
}

A_pj2_main.cpp

#include "A_pj2_UserDefine.h"
#include <iostream>
int main(){
    A_pj2_UserDefine_Reg();
    int s1=doIt<1>(3,2); //= 5 (correct)
    int s2=doIt<2>(3,2); //= 1 (correct)
    std::cout << s1<<" "<<" "<<s2<<std::endl;
    int asfasd=0;
}

(edit) Here is the disassembly (optimized version):-

int s2=doIt<2>(3,2);
    std::cout << s1<<" "<<" "<<s2<<std::endl;
00B61620  mov         ecx,dword ptr [_imp_?cout@std@@3V?$basic_ostream@DU?$char_traits@D@std@@@1@A (0B6B0A8h)]  
    int s1=doIt<1>(3,2); //:
00B61626  xor         eax,eax  
00B61628  cmp         byte ptr [BoolT<1>::op (0B6E001h)],al  
    int s2=doIt<2>(3,2);
    std::cout << s1<<" "<<" "<<s2<<std::endl;
00B6162E  push        offset std::endl<char,std::char_traits<char> > (0B61530h)  
00B61633  push        1  
    int s1=doIt<1>(3,2); //:
00B61635  setne       al  
    A_pj2_UserDefine_Reg();
00B61638  mov         byte ptr [BoolT<2>::op (0B6E000h)],0  
    int s2=doIt<2>(3,2);
    std::cout << s1<<" "<<" "<<s2<<std::endl;
00B6163F  push        offset string " " (0B6B220h)  
00B61644  push        offset string " " (0B6B220h)  
    int s1=doIt<1>(3,2); //:
00B61649  lea         eax,[eax*4+1]  
    int s2=doIt<2>(3,2);
    std::cout << s1<<" "<<" "<<s2<<std::endl;
00B61650  push        eax  
00B61651  call        dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (0B6B0B4h)]  
00B61657  push        eax  
00B61658  call        std::operator<<<std::char_traits<char> > (0B61310h)  
00B6165D  add         esp,8  
00B61660  push        eax  
00B61661  call        std::operator<<<std::char_traits<char> > (0B61310h)  
00B61666  add         esp,8  
00B61669  mov         ecx,eax  
00B6166B  call        dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (0B6B0B4h)]  
00B61671  mov         ecx,eax  
00B61673  call        dword ptr [__imp_std::basic_ostream<char,std::char_traits<char> >::operator<< (0B6B0B8h)]  
    int asfasd=0;
}
00B61679  xor         eax,eax  
00B6167B  ret  

Problem

The doIt(a,b) is called very often (>60000 per second).
I am curious if the if(BoolT<i>::op) can be optimized out.
If-condition is not good for pipeline computation.

Here is my program behavior that may help :-

  • Access-write to BoolT<2>::op always occur only at the beginning of program (e.g. A_pj2_UserDefine_Reg()).
  • Function query BoolT<2>::op (both directly and indirectly) almost always occur only in the project pj2.
  • User can't edit pj1.
  • In real case, besides A_pj2_main.cpp, there are a lot of .cpp that call pj1's doIt<>(). Here is the include graph :-

enter image description here

My poor solution

B_pj1.h

#pragma once
template<int i> bool op(){return true;}
template<int i> int doIt(int a,int b){
    if(op<i>()){
        return a+b;
    }else{ return a-b;}
    // return a+b;
}

B_pj2_UserDefine.cpp

#include "B_pj1.h"
template<> bool op<2>(){return false;}

B_pj2_main.cpp

#include <iostream>
#include "B_pj1.h"
int main(){
    int s1=doIt<1>(3,2); //:
    int s2=doIt<2>(3,2);
    std::cout << s1<<" "<<" "<<s2<<std::endl;
    int asfasd=0;
}

This program is ill-formed. (Reference : Storing C++ template function definitions in a .CPP file)

Error LNK2005 "bool __cdecl op<2>(void)" (??$op@$01@@YA_NXZ) already defined in B_pj2_UserDefine.obj

cppBeginner
  • 1,114
  • 9
  • 27
  • I think if you make your class polymorphic with one derived class having true and one with false , should do the trick because then they'll have their own versions of doIt() without any conditionals – najayaz Jan 14 '19 at 05:47
  • 2
    Just do it the "natural" way first, and build with optimizations enabled. Then *measure* and *profile* and *check the generated code*. Perhaps the compiler is already doing the "right thing"? – Some programmer dude Jan 14 '19 at 05:48
  • @G-man That way, I would suffer v-table cost. ? – cppBeginner Jan 14 '19 at 05:49
  • @Some programmer dude Thank, I am going to try it. – cppBeginner Jan 14 '19 at 05:50
  • The definition `op<2>` through specialization is only known for the compile unit `B_pj2_UserDefinition.cpp` every other compile unit will use the not specialized version of the template function for `2`, so it will fail at linking time, as you have multiple (and even different) definitions for `op<2>`. You need to place that definition `op<2>` in the header `B_pj1.h`. – t.niese Jan 14 '19 at 06:02
  • @t.niese Thank. I can't move `op<2>` to `B_pj1.h` because it is user-defined. – cppBeginner Jan 14 '19 at 06:19
  • I'm not sure if it will work and is standard conform, but you need to at least move the declaration `template<> bool op<2>();` to the `B_pj1.h` so that the compiler knows, that this specialization exists and should be used in favor of the general one. But even if it works, it is at least fishy from the code design perspective. – t.niese Jan 14 '19 at 06:24
  • @t.niese Yes, I agree on the principle/rule, but it is like merging 2 projects together. I try to avoid it. Thank. :) – cppBeginner Jan 14 '19 at 06:29

2 Answers2

1

Your current code is fine. If-conditions only harm pipeline computations when they take different branch each time. When that flag only written once, branch prediction does very good job.

Anyway, the common solution to your question is C preprocessor macros. Technically possible to solve with e.g. if constexpr but in practice that would require the library to #include something from user's project, and most library authors don't want to support such use-case.

If you're willing to make the library header-only it's also possible to solve with templates, but more complex. I try to avoid templates across library boundaries when I can, due to e.g. compilation time overhead.

Soonts
  • 20,079
  • 9
  • 57
  • 130
  • I heard that branch-prediction usually cache only the last one or two if-else, and the cache aggregates in global scope. I don't think it will work good for the case : `for(int n=0;n<10000;n++){ doIt<1>(); doIt<2>(); doIt<3>(); }`. Should I / Can I localize branch prediction? – cppBeginner Jan 14 '19 at 06:27
  • @cppBeginner The details are unknown, but there're rumors modern CPUs have 1024-4096 entries in their branch target buffers. If true, this means your example with 3 branches fits nicely. – Soonts Jan 14 '19 at 06:34
  • About preprocessor macros, how to make many/every file in a single project includes a certain header (.h) that contains such macro, without editing those .cpp directly? Without such technique, it would be tedious. Furthermore, I can't find a way with stdafx.h. The precompiled header seems to be solution-level, not project level. – cppBeginner Jan 14 '19 at 06:35
  • if you really don't want to edit CPPs, define that macro in command-line/build system/project settings. But since you're using VS I recommend enabling precompiled headers. You'll then be forced by the compiler to include stdafx.h everywhere. After you'll change all .cpp files, stdafx.h will become the header where it's easy to define globally-available stuff. – Soonts Jan 14 '19 at 06:41
  • Thank a lot for the macro and stdafx advise. +1, I will probably use it to optimize the final product. Hmm, it will take time to study and test... – cppBeginner Jan 14 '19 at 07:16
  • @cppBeginner About precompiled headers, the correct setup is following. On the project level, `Use (/Yu)`. On the single file stdafx.cpp, `Create (/Yc)`. The feature helps a lot with compilation times, the standard C++ libraries are huge can easily be dozens of megabytes, and precompiled header is essentially a memory dump of the parsed set of headers so each source files don’t have to parse them again. – Soonts Jan 14 '19 at 21:59
  • Thank. It can do what I need. The only downside is that I have to place every "userDefine" inside the stdafx of that project. It is quite far away from the relate logic. Furthermore, if I have to move `x.h` to another project later (refactor), I will have to move the related part in `stdafx.h` to `stdafx.h` of destination project too. – cppBeginner Jan 15 '19 at 01:54
0

I think the example is reduced too much to see the intent. So let me assume you have a good reason to do so.

First of all, make sure you are optimizing, on MSVC, that would be adding /O2 (or similar) to the command line.

Secondly, measure. Branch predictors in CPUs are really efficient, you might not even notice.

That said, when the compiler sees if (false) and it doesn't optimize it, you can consider it a bug in the compiler. Constant folding, especially when using SSA as model, is soo trivial that those branches should disappear rather quickly.

When in doubt, or when you want to force it at compile time, using if constexpr is the solution. It does require C++17 and is available in MSVC2017.

Another alternative is making your function constexpr, might not always be possible though it would look like this:

template<int i> constexpr int doIt(int a,int b){
    if(BoolT<i>::op){
        return a+b;
    }else{ return a-b;}
}

And you can call it like this:

constexpr int s1=doIt<1>(3,2);
int s2=doIt<2>(3,i);

With this, your compiler is required to calculate the value of s1 at compile time. While it can reuse the function to calculate s2 at runtime. (Note variable i being passed)

Regarding your linker error, there is an ODR (one definition rule) in C++. You can get around it by adding inline, however, you should have the same implementation everywhere! I wouldn't recommend putting the implementation in a CPP file as it is a receipt for UB.

JVApen
  • 11,008
  • 5
  • 31
  • 67