6

I made a small test to check the performance of global function/functor/lambda as comparator parameters for std::sort function. Functor and lambda give the same performance. I was surprised to see, that global function, which appears to be the simplest callback, is much slower.

#include <stdafx.h>
#include <windows.h>
#include <iostream>
#include <stdlib.h>
#include <time.h>
#include <vector>
#include <string>
#include <sstream>
#include <algorithm>
using namespace std;

const int vector_size = 100000;

bool CompareFunction(const string& s1, const string& s2) 
{ 
    return s1[0] < s2[0];  // I know that is crashes on empty string, but this is not the point here
}

struct CompareFunctor 
{
    bool operator() (const string& s1, const string& s2) 
    { 
        return s1[0] < s2[0]; 
    }
} compareFunctor;

int main()
{
    srand ((unsigned int)time(NULL));
    vector<string> v(vector_size);

    for(size_t i = 0; i < vector_size; ++i)
    {
        ostringstream s;
        s << rand();
        v[i] = s.str().c_str();
    }

    LARGE_INTEGER freq;
    LARGE_INTEGER beginTime, endTime;
    QueryPerformanceFrequency(&freq);
    QueryPerformanceCounter(&beginTime);

    // One of three following lines should be uncommented
    sort(v.begin(), v.end(), CompareFunction);
    // sort(v.begin(), v.end(), compareFunctor);
    // sort(v.begin(), v.end(), [](const string& s1, const string& s2){return s1[0] < s2[0];});

    QueryPerformanceCounter(&endTime);
    float f = (endTime.QuadPart - beginTime.QuadPart) *  1000.0f/freq.QuadPart;      // time in ms
    cout << f << endl;

    return 0;
}

A bit of Windows-specific code is used for precise execution time measurement. Environment: Windows 7, Visual C++ 2010. Of course, Release configuration with default optimizations turned on. Execution time:

Global function 2.6 - 3.6 ms   (???)
Functor - 1.7 - 2.4 ms
Lambda - 1.7 - 2.4 ms

So, why the global function is slower? Some problem with VC++ compiler, or something else?

Alex F
  • 42,307
  • 41
  • 144
  • 212
  • 7
    Your timing method is suspect - put the call to `sort` into a loop and call it 100 or 1000 times. – Paul R Jan 29 '14 at 08:19
  • 4
    I believe you should be setting the seed manually, to make sure you're comparing performance on identical data. – Angew is no longer proud of SO Jan 29 '14 at 08:20
  • 4
    globals are not only slower, they slip into your fridge at night and nibble on your cheese! :)) – Paul Evans Jan 29 '14 at 08:21
  • 4
    It will be highly unlikely your global function will inline within the sorting algorithm where the comparator is repeatedly referenced. Conversely, lambda or functor can usually easily do so. Look at the asm. – WhozCraig Jan 29 '14 at 08:22
  • @WhozCraig You're probably right, but why is that? The compiler can see the body just as clearly as for the others, and on default Release optimisation level, VC++ is set to inline "any suitable." – Angew is no longer proud of SO Jan 29 '14 at 08:25
  • @WhozCraig, can you elaborate on that? I understand what you are saying but I can grasp the reasons. Why can't the function be inlined? – Tamás Szelei Jan 29 '14 at 08:27
  • 1
    Things are a bit weirder than they look. Using the global function from within the lambda also gives fast performance: `sort(v.begin(), v.end(), [](const string& s1, const string& s2){return CompareFunction(s1, s2);});` – ComicSansMS Jan 29 '14 at 08:29
  • @WhozCraig - this is what I tried to do, but couldn't find the reason, `std::sort` assembly code was too much complicated for me... Can you give some explanation about this point: `It will be highly unlikely your global function will inline` and post it as an answer? – Alex F Jan 29 '14 at 08:30
  • 4
    @AlexFarber do you have a copy of Scott Meyers Effective STL ? He does a much better job of explaining it than I could (obviously), but the short of it is this: that `operator()` is absorbed at compilation time and inlined all over the place. Passing a global function pointer is *generally* no better than invoking with a "callback". Ie. all the algorithm has is an *address* in a function pointer on which to generate a `call`. Regarding why it performs well when buried in a functor, because there it *can* be inlined. its not a pointer to some function; its *already* a `call`. – WhozCraig Jan 29 '14 at 08:33
  • 2
    @AlexFarber and yes, i'll post it as an answer if you feel it worthy, though I'm pretty sure its a dupe *somewhere* out there. Edit: [Found related article](http://stackoverflow.com/questions/4708105/performance-of-qsort-vs-stdsort), though it is a comparison between `qsort()` and `std::sort`, the logic still applies. Edit2: [And another (arguably better) direct citation](http://stackoverflow.com/questions/16369807/the-benefit-of-function-objects) – WhozCraig Jan 29 '14 at 08:36
  • Looking at the assembly this does not look like an inlining issue. The compiler does not seem to inline in either case. The [only mentionable difference](http://pastebin.com/GmQfD4X3) seems to be that the comparator function's address is loaded using `lea` while the functor uses a simple `movzx`. I'd suspect a caching/TLB issue slowing down the `lea`, as the difference disappears when shrinking the vector to 1000 elements, but that's just a quick guess. – ComicSansMS Jan 29 '14 at 08:55
  • As already mentioned, right now, your timing method produces completely worthless results, it's not measuring - it's guessing. I'm writing this because I feel it wasn't stressed enough in the comments. You should test all variants on same data and you must have many more samples! – user2802841 Jan 29 '14 at 08:57
  • @WhozCraig - thanks, I got it. Your explanation about `operator()` vs global function is the answer. – Alex F Jan 29 '14 at 08:58
  • @ComicSansMS - I put breakpoint in `CompareFunction` and started debugging in Release mode. Debugger stopped on this breakpoint, and `CompareFunction` was not inlined. Breakpoint inside the `CompareFunctor` is never hit. So, I think that WhozCraig`s explanation is correct, at least for my compiler. Thanks. – Alex F Jan 29 '14 at 09:22
  • @PaulR, Angew, user2802841 - applying your suggestions didn't change the final results significantly. Thanks. – Alex F Jan 29 '14 at 09:25
  • 1
    @AlexFarber Just tossed [this example](http://ideone.com/EFRmka) together. You may find it interesting. Enjoy. – WhozCraig Jan 29 '14 at 09:31

3 Answers3

3

Passing a global function is the most complex, not the simplest.

When you pass in a function you are in fact passing in a pointer to the function so the sort function can't easily inline the call to the function as it doesn't know at compile time what the pointer will point to. Sure, it may be able to figure out that the call through the function pointer calls the same function every time and inline it all, but that's difficult.

When you use a lambda or functor, the compiler knows exactly which function it needs to call when it is generating the code so it is very much likely to be able to inline it all.

jcoder
  • 29,554
  • 19
  • 87
  • 130
2

the lambda and functor versions are in lined effectively eliminating the pushing and popping of arguments for every compare.

Try using

inline bool CompareFunction(const string& s1, const string& s2) 
{ 
    return s1[0] < s2[0];  // I know that is crashes on empty string, but this is not the point here
}

and see if it makes a difference. Note that automatic inlining by compilers will vary a lot depending on the compiler, build version etc. I would be surprised that the compiler doesn't automatically inline your global function - unless you're actually compiling in debug mode - which you shouldn't be doing for a performance test case. To really test whether inlining is the issue, you should divide your test into two files and compile them separately

replace

bool CompareFunction(const string& s1, const string& s2){ 
    return s1[0] < s2[0];  // I know that is crashes on empty string, but this is not the point here
}

with

bool CompareFunction(const string& s1, const string& s2);

and put the definition in a separate file - say compare.cpp

While you're at it, you could frustrate inlining for functors as well by using:

struct CompareFunctor 
{
    bool operator() (const string& s1, const string& s2);
} compareFunctor;

and putting in a separate file

bool CompareFunctor::operator() (const string& s1, const string& s2)
{ 
    return s1[0] < s2[0]; 
}
Robert Ramey
  • 1,114
  • 8
  • 16
0

You should call the sort a few thousand times to get more precise results.

How fast this goes depends on the compiler's smarts. It might inline some operations (lambdas very probably, functors probably, non-inline globals unlikely). Also, if the comparison is inlined or not will depend on its complexity; and the results will differ.

I'd strongly advise against looking at such detailed "optimizations." Your time programming is much more expensive than the (very small) gain you'll get in run time. Concentrate on writing clean, understandable, simple code. Trying to understand "bummed for ultimate speed" code next week will just get you to go prematurely bald.

vonbrand
  • 11,412
  • 8
  • 32
  • 52