9

Dear StackOverflowers,

I got a simple piece of code which I am compiling on Microsoft Visual Studio C++ 2012:

int add(int x, int y)
{
    return x + y;
}

typedef int (*func_t)(int, int);

class A
{
public:
    const static func_t FP;
};

const func_t A::FP = &add;

int main()
{
 int x = 3;
 int y = 2;
 int z = A::FP(x, y);
 return 0;
}

The compiler generates the following code:

int main()
{
000000013FBA2430  sub         rsp,28h  
int x = 3;
int y = 2;
int z = A::FP(x, y);
000000013FBA2434  mov         edx,2  
000000013FBA2439  lea         ecx,[rdx+1]  
000000013FBA243C  call        qword ptr [A::FP (013FBA45C0h)]  
return 0;
000000013FBA2442  xor         eax,eax
}

I compiled this on the 'Full optimisation' (/Obx flag) and 'Any Suitable' for Inline function Expansion. (/Ob2 flag)

I was wondering why the compiler doesn't inline this call expecially since it's const. Does any of you have an idea why it is not inlined and if it's possible to make the compiler inline it?

Christian

EDIT: I am running some tests now and MSVC fails to inline the function pointers too when:

-I move the const pointer out of the class and make it global.

-I move the const pointer out of the class and make it local in main.

-I make the pointer non-const and move it in locally.

-When I make the return type void and giving it no parameters

I kind start believing Microsoft Visual Studio cannot inline function pointers at all...

  • @Otávio Décio I don't have the GCC compiler so I am sorry I do not know! –  May 09 '13 at 13:24
  • Since you're using a proprietary compiler, it's unlikely that anyone who knows the answer will be able to tell you without dire legal consequences. – Mike Seymour May 09 '13 at 13:32
  • @Mike Seymour I know what you mean. I don't really expect a technical explanation about the compiler itself. I just hope I code somehow modify my code which would make the probability of the compiler inlining it more likely. –  May 09 '13 at 13:34
  • 1
    For what it's worth, GCC removes the whole thing since there are no side effects. If I add `volatile` to the three variables to force it to do the calculation, then it inlines the function call. – Mike Seymour May 09 '13 at 13:34
  • @Mike Seymour According to Wikipedia the 'volatile' keyword is used to prevent the compiler from doing any optimizations on them. Wouldn't it be weird to apply it to something you actually want to be optimized? Or am I seeing this wrong? –  May 09 '13 at 13:39
  • @ChristianVeenman: I was using it to prevent the optimisation of removing the entire body of `main` due to it having no side effects, by forcing it to read `x` and `y`, use the read values in the calculation, and write the result to `z`. Once I did that, the generated code contained an `add` instruction (the body of `add()`), and no function call: the function had been inlined. – Mike Seymour May 09 '13 at 13:44
  • @Mike Seymour Sounds logical to me! I tried doing that to, but I got the same result (Might differ a little bit, but at least the call instruction was still there.) –  May 09 '13 at 13:51
  • 2
    GCC 4.7.2 (with -O3) optimizes everything out, simply returning 0. If you change main to return `z`, it simply loads the constant `5` into a register and returns that (so, the function call is indeed inlined in GCC). I can't comment on why it might not be inlined in MSVC. – Cameron May 09 '13 at 16:42
  • @Cameron You guys might pull me over to use GCC to compile. But I am still hoping someone might have a good answer. –  May 09 '13 at 16:51
  • 3
    Your use case needs two optimizations: (1) Conversion of an indirect call using a constant pointer into a direct call. (2) Inlining a direct call. MSVC definitely does the second, yet the inlining is what all the technical language in your question discusses. I suggest a rewrite that focuses on the first. Note that optimization (1) is similar to but not identical to "devirtualization". Yet another interesting case is when the function pointer is a non-type template argument. – Ben Voigt May 09 '13 at 18:01
  • @Ben Voigt Mmm that sounds interesting! Do you think MSVC is able to do the first type of optimization? When I have more time I'll try to test the template version! –  May 09 '13 at 19:34
  • 1
    @ChristianVeenman: I'd guess that 99% of real code using function pointers results in code that's nigh impossible know what it points at at compile time, and is thus night impossible to inline. If I were a compiler writer, I'd probably ignore that case entirely too. – Mooing Duck May 09 '13 at 20:08
  • 2
    I did a couple small tests. With a function pointer used as a non-type template argument with very simple code, MSVC2010 is able to optimize it away and put `5` into a register as a constant. With more complex code (real code that I actually use) that implements functors using non-type function pointer templated objects, MSVC does not optimize away the function call (even the wrapper call is not inlined), but GCC once again is able to optimize everything out entirely (I did find a small template bug in GCC but that's another story). – Cameron May 09 '13 at 20:38
  • @Cameron Thanks a lot for the tests! (+1!) I guess I am going to try to use Makefiles with Visual Studio and use GCC or try the Intel Compiler plugin! –  May 09 '13 at 21:44
  • @MooingDuck Logical deduction about the compiler writer! (+1 too!) –  May 09 '13 at 21:45
  • @Christian: No problem. (The GCC bug [turned out to be a fairly harmless bug in MSVC](http://stackoverflow.com/questions/16471161/is-this-a-bug-in-gcc), by the way.) If you really need the function to be inlined, go ahead and use GCC. But in general, MSVC is actually fairly good at producing fast code (as is GCC) -- the best option is to write cross-platform code as much as possible that works reasonably fast under many compilers and architectures. Chances are things like this won't be bottlenecks -- and if they are, do you really want to be limited to using a single compiler? – Cameron May 09 '13 at 22:01
  • @Cameron Hey Cameron, this is the idea I had in mind and why I asked this question here: http://stackoverflow.com/questions/16478514/using-simd-in-a-game-engine-math-library-by-using-function-pointers-a-good-ide –  May 10 '13 at 08:56

4 Answers4

2

The problem isn't with inlining, which the compiler does at every opportunity. The problem is that Visual C++ doesn't seem to realize that the pointer variable is actually a compile-time constant.

Test-case:

// function_pointer_resolution.cpp : Defines the entry point for the console application.
//

extern void show_int( int );

extern "C" typedef int binary_int_func( int, int );

extern "C" binary_int_func sum;
extern "C" binary_int_func* const sum_ptr = sum;

inline int call( binary_int_func* binary, int a, int b ) { return (*binary)(a, b); }

template< binary_int_func* binary >
inline int callt( int a, int b ) { return (*binary)(a, b); }

int main( void )
{
    show_int( sum(1, 2) );
    show_int( call(&sum, 3, 4) );
    show_int( callt<&sum>(5, 6) );
    show_int( (*sum_ptr)(1, 7) );
    show_int( call(sum_ptr, 3, 8) );
//  show_int( callt<sum_ptr>(5, 9) );
    return 0;
}

// sum.cpp
extern "C" int sum( int x, int y )
{
    return x + y;
}

// show_int.cpp
#include <iostream>

void show_int( int n )
{
    std::cout << n << std::endl;
}

The functions are separated into multiple compilation units to give better control over inlining. Specifically, I don't want show_int inlined, since it makes the assembly code messy.

The first whiff of trouble is that valid code (the commented line) is rejected by Visual C++. G++ has no problem with it, but Visual C++ complains "expected compile-time constant expression". This is actually a good predictor of all future behavior.

With optimization enabled and normal compilation semantics (no cross-module inlining), the compiler generates:

_main   PROC                        ; COMDAT

; 18   :    show_int( sum(1, 2) );

    push    2
    push    1
    call    _sum
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int

; 19   :    show_int( call(&sum, 3, 4) );

    push    4
    push    3
    call    _sum
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int

; 20   :    show_int( callt<&sum>(5, 6) );

    push    6
    push    5
    call    _sum
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int

; 21   :    show_int( (*sum_ptr)(1, 7) );

    push    7
    push    1
    call    DWORD PTR _sum_ptr
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int

; 22   :    show_int( call(sum_ptr, 3, 8) );

    push    8
    push    3
    call    DWORD PTR _sum_ptr
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int
    add esp, 60                 ; 0000003cH

; 23   :    //show_int( callt<sum_ptr>(5, 9) );
; 24   :    return 0;

    xor eax, eax

; 25   : }

    ret 0
_main   ENDP

There's already a huge difference between using sum_ptr and not using sum_ptr. Statements using sum_ptr generate a indirect function call call DWORD PTR _sum_ptr while all other statements generate a direct function call call _sum, even when the source code used a function pointer.

If we now enable inlining by compiling function_pointer_resolution.cpp and sum.cpp with /GL and linking with /LTCG, we find that the compiler inlines all direct calls. Indirect calls stay as-is.

_main   PROC                        ; COMDAT

; 18   :    show_int( sum(1, 2) );

    push    3
    call    ?show_int@@YAXH@Z           ; show_int

; 19   :    show_int( call(&sum, 3, 4) );

    push    7
    call    ?show_int@@YAXH@Z           ; show_int

; 20   :    show_int( callt<&sum>(5, 6) );

    push    11                  ; 0000000bH
    call    ?show_int@@YAXH@Z           ; show_int

; 21   :    show_int( (*sum_ptr)(1, 7) );

    push    7
    push    1
    call    DWORD PTR _sum_ptr
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int

; 22   :    show_int( call(sum_ptr, 3, 8) );

    push    8
    push    3
    call    DWORD PTR _sum_ptr
    push    eax
    call    ?show_int@@YAXH@Z           ; show_int
    add esp, 36                 ; 00000024H

; 23   :    //show_int( callt<sum_ptr>(5, 9) );
; 24   :    return 0;

    xor eax, eax

; 25   : }

    ret 0
_main   ENDP

Bottom-line: Yes, the compiler does inline calls made through a compile-time constant function pointer, as long as that function pointer is not read from a variable. This use of a function pointer got optimized:

call(&sum, 3, 4);

but this did not:

(*sum_ptr)(1, 7);

All tests run with Visual C++ 2010 Service Pack 1, compiling for x86, hosted on x64.

Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 16.00.40219.01 for 80x86

Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • But isn't a function pointer technically a variable with an address to a function? And can I conclude that MSVC is not able to see those are compile-time constants? –  May 10 '13 at 08:33
  • @ChristianVeenman: Definitely MSVC is not able to see that `sum_ptr` is a compile-time constant, because it says as much if you uncomment the line `callt(5, 9)` – Ben Voigt May 10 '13 at 14:09
  • Thanks for the explanation! Also for the testruns! –  May 10 '13 at 15:08
1

I think that you're right in this conclusion: "... cannot inline function pointers at all".

This very simple example also breaks optimization:

static inline
int add(int x, int y)
{
    return x + y;
}

int main()
{
    int x = 3;
    int y = 2;
    auto q = add;
    int z = q(x, y);
    return z;
}

Your sample is even more complex for the compiler, so it is not surprising.

qehgt
  • 2,972
  • 1
  • 22
  • 36
  • Nope, there are some cases where function pointers do get optimized. – Ben Voigt May 09 '13 at 22:51
  • @BenVoigt I agree they sometimes get optimized, but according to my tests ( and Camerons ) they never get inlined. Or do I miss a specific test? –  May 10 '13 at 08:27
  • @ChristianVeenman: Yes, the second and third test cases in my answer got inlined. – Ben Voigt May 10 '13 at 14:08
0

You can try __forceinline. Nobody is going to be able to tell you exactly why it isn't inlined. Common sense says to me that it should be, however. /O2 should favor code speed over code size (inlining)... Strange.

RandyGaul
  • 1,915
  • 14
  • 21
  • I tested it by adding __forceinline at the add function but I still got the same call instruction being there! I also tried both options "Favor for speed" and "Favor for small size" but neither removed the call function! And thanks! –  May 09 '13 at 16:51
  • If worst comes to worst you can use a macro instead of a static function :) – RandyGaul May 09 '13 at 16:51
  • heheh thanks, but I will probably rather move to GCC! ( Or use makefiles within Visual Studio ) –  May 09 '13 at 17:07
  • 1
    The compiler can't inline an indirect call no matter how much you command it to. First it needs to remove the indirection. – Ben Voigt May 09 '13 at 18:02
  • 2
    @BenVoigt It should remove the indirection due to the indirection being a compile-time constant. – RandyGaul May 09 '13 at 18:06
  • @Randy: But it didn't (look at the disassembly in the question). – Ben Voigt May 09 '13 at 18:07
  • 1
    Sure, it didn't, but it should hence the question being asked in the first place. – RandyGaul May 09 '13 at 18:07
  • @BenVoigt I agree with Randy here. Poeple say modern compilers get smarter and can simplify things that are known at compile time. Because the function pointer cannot change because of input to the program. ( And is const ) it would be logical for a compiler to inline the indirect call. –  May 09 '13 at 19:36
  • 1
    @ChristianVeenman: It would be logical for the compiler to first recognize that the indirect call always goes to the same target, and convert to a direct call. And then inline the direct call. But inlining of an indirect call is never possible. No amount of changing inlining settings will have any effect as long as the compiler is generating an indirect call. – Ben Voigt May 09 '13 at 19:45
  • @BenVoigt I understand what you mean. You're right about it not being able to inline a direct call. But what my question ment was if there was a way to make the compiler convert an indirect call to a direct one and then inline. But thanks for this correction! –  May 09 '13 at 21:39
0

This is not a real answer, but a "maybe workaround" one: STL from Microsoft once mentioned that lambdas are more easily inlineable than f ptrs so you could try that.

As a trivia Bjarne often mentions that sort is faster thatn qsort because qsort takes function ptr, but like other people have noted gcc has no problems inlining them... so maybe Bjarne should try gcc :P

NoSenseEtAl
  • 28,205
  • 28
  • 128
  • 277
  • Thanks!:P Bjarne Definitly should.:P Hehe too bad that the Visual Studio IDE just is way better then any GCC one. (in my opinion:P) –  Feb 25 '14 at 22:56