2

I have some template code that implements pretty heavy computations, but I only need it for floats and doubles. The goal is that the template instantiation is only done once in one compilation unit and not repeated for every file.

I tried to follow the ideas from the following Stackoverflow posts:

and similiar duplicate questions. I came up with the following test to illustrate the issue:

A.h

#pragma once
#include <cmath>
template<typename T>
struct A
{
    static T foo(T a, T b)
    {
        //do some heavy computations
        T v1 = pow(a, b);
        return pow(v1, b);
    }
};

//explicit template instantiations, the declaration
extern template struct A<float>;
extern template struct A<double>;

A.cpp

#include "A.h"
//explicit template instantiations, the definition
template struct A<float>;
template struct A<double>;

Main.cpp

#include "A.h"
int main()
{
    //use A
    float result = A<float>::foo(0, 0);
    return (int)result; //return it so that it doesn't get optimized away
}

When I now look at the generated .obj file (dumpbin /DISASM), I get the following output:

A.obj

Dump of file A.obj

File Type: COFF OBJECT

?foo@?$A@M@@SAMMM@Z (public: static float __cdecl A<float>::foo(float,float)):
  0000000000000000: F3 0F 11 4C 24 10  movss       dword ptr [rsp+10h],xmm1
  0000000000000006: F3 0F 11 44 24 08  movss       dword ptr [rsp+8],xmm0
  000000000000000C: 55                 push        rbp
  000000000000000D: 57                 push        rdi
  000000000000000E: 48 81 EC 18 01 00  sub         rsp,118h
                    00
  0000000000000015: 48 8D 6C 24 30     lea         rbp,[rsp+30h]
  000000000000001A: 48 8B FC           mov         rdi,rsp
  000000000000001D: B9 46 00 00 00     mov         ecx,46h
  0000000000000022: B8 CC CC CC CC     mov         eax,0CCCCCCCCh
  0000000000000027: F3 AB              rep stos    dword ptr [rdi]
  0000000000000029: F3 0F 10 8D 08 01  movss       xmm1,dword ptr [rbp+108h]
                    00 00
  0000000000000031: F3 0F 10 85 00 01  movss       xmm0,dword ptr [rbp+100h]
                    00 00
  0000000000000039: E8 00 00 00 00     call        ?pow@@YAMMM@Z
  000000000000003E: F3 0F 11 45 04     movss       dword ptr [rbp+4],xmm0
  0000000000000043: F3 0F 10 8D 08 01  movss       xmm1,dword ptr [rbp+108h]
                    00 00
  000000000000004B: F3 0F 10 45 04     movss       xmm0,dword ptr [rbp+4]
  0000000000000050: E8 00 00 00 00     call        ?pow@@YAMMM@Z
  0000000000000055: 48 8D A5 E8 00 00  lea         rsp,[rbp+0E8h]
                    00
  000000000000005C: 5F                 pop         rdi
  000000000000005D: 5D                 pop         rbp
  000000000000005E: C3                 ret

?foo@?$A@N@@SANNN@Z (public: static double __cdecl A<double>::foo(double,double)):
  0000000000000000: F2 0F 11 4C 24 10  movsd       mmword ptr [rsp+10h],xmm1
  0000000000000006: F2 0F 11 44 24 08  movsd       mmword ptr [rsp+8],xmm0
  000000000000000C: 55                 push        rbp
  000000000000000D: 57                 push        rdi
  000000000000000E: 48 81 EC 18 01 00  sub         rsp,118h
                    00
  0000000000000015: 48 8D 6C 24 30     lea         rbp,[rsp+30h]
  000000000000001A: 48 8B FC           mov         rdi,rsp
  000000000000001D: B9 46 00 00 00     mov         ecx,46h
  0000000000000022: B8 CC CC CC CC     mov         eax,0CCCCCCCCh
  0000000000000027: F3 AB              rep stos    dword ptr [rdi]
  0000000000000029: F2 0F 10 8D 08 01  movsd       xmm1,mmword ptr [rbp+108h]
                    00 00
  0000000000000031: F2 0F 10 85 00 01  movsd       xmm0,mmword ptr [rbp+100h]
                    00 00
  0000000000000039: E8 00 00 00 00     call        pow
  000000000000003E: F2 0F 11 45 08     movsd       mmword ptr [rbp+8],xmm0
  0000000000000043: F2 0F 10 8D 08 01  movsd       xmm1,mmword ptr [rbp+108h]
                    00 00
  000000000000004B: F2 0F 10 45 08     movsd       xmm0,mmword ptr [rbp+8]
  0000000000000050: E8 00 00 00 00     call        pow
  0000000000000055: 48 8D A5 E8 00 00  lea         rsp,[rbp+0E8h]
                    00
  000000000000005C: 5F                 pop         rdi
  000000000000005D: 5D                 pop         rbp
  000000000000005E: C3                 ret
....

Main.obj

Dump of file Main.obj

File Type: COFF OBJECT

?foo@?$A@M@@SAMMM@Z (public: static float __cdecl A<float>::foo(float,float)):
  0000000000000000: F3 0F 11 4C 24 10  movss       dword ptr [rsp+10h],xmm1
  0000000000000006: F3 0F 11 44 24 08  movss       dword ptr [rsp+8],xmm0
  000000000000000C: 55                 push        rbp
  000000000000000D: 57                 push        rdi
  000000000000000E: 48 81 EC 18 01 00  sub         rsp,118h
                    00
  0000000000000015: 48 8D 6C 24 30     lea         rbp,[rsp+30h]
  000000000000001A: 48 8B FC           mov         rdi,rsp
  000000000000001D: B9 46 00 00 00     mov         ecx,46h
  0000000000000022: B8 CC CC CC CC     mov         eax,0CCCCCCCCh
  0000000000000027: F3 AB              rep stos    dword ptr [rdi]
  0000000000000029: F3 0F 10 8D 08 01  movss       xmm1,dword ptr [rbp+108h]
                    00 00
  0000000000000031: F3 0F 10 85 00 01  movss       xmm0,dword ptr [rbp+100h]
                    00 00
  0000000000000039: E8 00 00 00 00     call        ?pow@@YAMMM@Z
  000000000000003E: F3 0F 11 45 04     movss       dword ptr [rbp+4],xmm0
  0000000000000043: F3 0F 10 8D 08 01  movss       xmm1,dword ptr [rbp+108h]
                    00 00
  000000000000004B: F3 0F 10 45 04     movss       xmm0,dword ptr [rbp+4]
  0000000000000050: E8 00 00 00 00     call        ?pow@@YAMMM@Z
  0000000000000055: 48 8D A5 E8 00 00  lea         rsp,[rbp+0E8h]
                    00
  000000000000005C: 5F                 pop         rdi
  000000000000005D: 5D                 pop         rbp
  000000000000005E: C3                 ret
....

A::foo is instantiated in A.obj as expected. But the code is again put into Main.obj as well, completely ignoring the extern keyword.

How can I tell the compiler (Visual Studio 2017, Release mode) to NOT inline the method, but to use the version from A.obj?

Shaman
  • 67
  • 4

1 Answers1

3

You can do that with __declspec(noinline).

But inlined version will likely be faster. If you worry about binary size, your .exe file will only have a single instance of that function. The code from A.obj is unused and will be discarded by linker during dead code elimination step.

Update: Put this in your A.h:

static __declspec( noinline ) T foo( T a, T b )
{
    //do some heavy computations
    T v1 = pow( a, b );
    return pow( v1, b );
}

I’ve built with Visual C++ 2017 15.6.7, Release 32 and 64 bits, for both platforms Main.cpp compiles to this:

; Line 5
    call    ?foo@?$A@M@@SAMMM@Z         ; A<float>::foo
; Line 6
    cvttss2si eax, xmm0

However, if you’re doing that trying to decrease compilation time, I’m not sure noinline gonna help. Instead, remove the function body from A.h (leave declaration), move it into A.cpp. Ideally, also remove eigen headers from A.h (or leave bare minimum that define data structures), and include eigen headers into A.cpp.

Soonts
  • 20,079
  • 9
  • 57
  • 130
  • Sadly, the code is still inlined in Main.obj. I'm more concerned about compilation times. In real `foo` contains some heavy matrix operations and linear solvers with Eigen, the compilation takes up to 2 minutes. I don't want to repeat that for every file where I use that helper. – Shaman May 09 '18 at 07:46
  • Moving the function body from A.h into A.cpp did the trick. Thanks! – Shaman May 12 '18 at 07:18