5

I was testing the speed of an exported function in a DLL and a normal function. How is it possible that an exported function in a DLL is a lot faster?

100000000 function calls in a DLL cost: 0.572682 seconds
100000000 normal function class cost: 2.75258 seconds

This is the function in the DLL:

extern "C" __declspec (dllexport) int example()
{
    return 1;
}

This is the normal function call:

int example()
{
    return 1;
}

This is how I test it:

int main()
{
    LARGE_INTEGER frequention;
    LARGE_INTEGER dllCallStart,dllCallStop;
    LARGE_INTEGER normalStart,normalStop;
    int resultCalculation;

    //Initialize the Timer
    ::QueryPerformanceFrequency(&frequention);
    double frequency = frequention.QuadPart;
    double secondsElapsedDll = 0;
    double secondsElapsedNormal = 0;

    //Load the Dll
    HINSTANCE hDll = LoadLibraryA("example.dll");

    if(!hDll)
    {
        cout << "Dll error!" << endl;
        return 0;
    }

    dllFunction = (testFunction)GetProcAddress(hDll, "example");

    if( !dllFunction )
    {
        cout << "Dll function error!" << endl;
        return 0;
    }

    //Dll
    resultCalculation = 0;
    ::QueryPerformanceCounter(&dllCallStart);
    for(int i = 0; i < 100000000; i++)
        resultCalculation += dllFunction();
    ::QueryPerformanceCounter(&dllCallStop);

    Sleep(100);

    //Normal
    resultCalculation = 0;
    ::QueryPerformanceCounter(&normalStart);
    for(int i = 0; i < 100000000; i++)
        resultCalculation += example();
    ::QueryPerformanceCounter(&normalStop);

    //Calculate the result time
    secondsElapsedDll = ((dllCallStop.QuadPart - dllCallStart.QuadPart) / frequency);
    secondsElapsedNormal = ((normalStop.QuadPart - normalStart.QuadPart) / frequency);

    //Output
    cout << "Dll: " << secondsElapsedDll << endl; //0.572682
    cout << "Normal: " << secondsElapsedNormal << endl; //2.75258

    return 0;
}

I only test the function call speed, getting the address can be done at start-up. So the performance lost of that doesn't mater.

Laurence
  • 1,815
  • 4
  • 22
  • 35
  • what's the result if you get rid of `extern "C"` ? – stijn Jan 11 '13 at 08:50
  • 1
    I don't believe that because *this* normal function will be inlined but the the function from DLL cannot be inlined. So I think you must be doing something inappropriately. Are you using release build? – Nawaz Jan 11 '13 at 08:52
  • 1
    How did you compile both projects? Certain optimization options might decide not to call the function at all, which is faster. – Yochai Timmer Jan 11 '13 at 08:52
  • 4
    Just for kicks, change that `int resultCalculation;` to `volatile int resultCalculation;` and rerun your test. Also, swap the order (run the local first, then the DLL). – WhozCraig Jan 11 '13 at 08:53
  • volatile int was the trick. Now the normal call is faster. Thanks. So optimization was the problem. – Laurence Jan 11 '13 at 08:56
  • Both projects are in release mode. – Laurence Jan 11 '13 at 08:57
  • @Laurence since everything else was the same, the only thing that made sense to me was the activity itself being tossed because the DLL is *more* expensive, not less. I'm somewhat shocked the local one was not tossed as well, but hacking in that `volatile` all but guarantees the only difference is the call expense itself. – WhozCraig Jan 11 '13 at 09:04
  • @WhozCraig When I was testing this I was calculate how much performance lost this would give. I also didn't believed it, so I asked it here. But I still don't understand why the “optimizations” made this slower. What kind of optimizations does the compiler do? Volatile prevents the compiler from applying any optimizations. So optimizations need to be the problem. – Laurence Jan 11 '13 at 09:14
  • @Laurence volatile doesn't guarantee throwing away all optimizations; only ones that would violate now-require observable behavior. In this case the side-effect being optimized out was pretty significant, but since nothing was observing that variable, it could all be canned. As I said I'm still surprised the local call was not. I've seen significantly larger pieces of code completely thrown out because ultimately there were many side effects, but nothing observed. I'd be curious to see what happens if you removed volatile, but added usage of that var at the end via a `cout` or some such. – WhozCraig Jan 11 '13 at 09:26
  • @Laurence ...after each run, of course, so twice. – WhozCraig Jan 11 '13 at 09:28
  • @WhozCraig This is the result: http://pastebin.com/riZV4X0j now I see how much overhead this gives. – Laurence Jan 11 '13 at 09:28
  • @WhozCraig going to test that. – Laurence Jan 11 '13 at 09:29
  • @WhozCraig when I add a cout after the stop timer I get very nice results. The dll calls lasts 0.235448 seconds. The Normal function calls 0 seconds. Both return the value 100000000. – Laurence Jan 11 '13 at 09:32
  • @Laurence after *both* stop timers, right? – WhozCraig Jan 11 '13 at 09:33
  • @WhozCraig Indeed: http://pastebin.com/BS5szsNG – Laurence Jan 11 '13 at 09:34
  • Maybe this has the same reason as this: http://stackoverflow.com/questions/11227809/why-is-processing-a-sorted-array-faster-than-an-unsorted-array – Laurence Jan 11 '13 at 09:36
  • @Laurence I wouldn't even hazard a guess. Mysticial would be much better equipped to tell you that. The guy is positively *sick* when it comes to CPU-related performance issues. Love that post, one of the best on SO. – WhozCraig Jan 11 '13 at 09:38
  • @WhozCraig that post is indeed very good. I guess that the compilers knows that I want to do 100000000 times X++; So I think it will calculate this when I compile the code, that way it can give the result without calculating. But it is very strage that it will take so long when you don't do anything with the result. Isn't it possible to request him to take a look at this case? – Laurence Jan 11 '13 at 09:45

1 Answers1

8

For a very small function, the difference is in the way that a function returns/cleans up arguments.

However, that shouldn't make that much of a difference. I think the compiler realizes that your function doesn't do anything with resultCalcuation and optimizes it away. Try using two different variables and print their value afterwards.

Mats Petersson
  • 126,704
  • 14
  • 140
  • 227
  • Optimization was indeed the problem. When I swap int resultCalculation; to volatile int resultCalculation; then the normal call is faster. ( Thanks @WhozCraig for this. ) – Laurence Jan 11 '13 at 09:02