22

I've been writing C++11 code for quite some time now, and haven't done any benchmarking of it, only expecting things like vector operations to "just be faster" now with move semantics. So when actually benchmarking with GCC 4.7.2 and clang 3.0 (default compilers on Ubuntu 12.10 64-bit) I get very unsatisfying results. This is my test code:

EDIT: With regards to the (good) answers posted by @DeadMG and @ronag, I changed the element type from std::string to my::string which does not have a swap(), and made all inner strings larger (200-700 bytes) so that they shouldn't be the victims of SSO.

EDIT2: COW was the reason. Adapted code again by the great comments, changed the storage from std::string to std::vector<char> and leaving out copy/move onstructors (letting the compiler generate them instead). Without COW, the speed difference is actually huge.

EDIT3: Re-added the previous solution when compiled with -DCOW. This makes the internal storage a std::string rather than a std::vector<char> as requested by @chico.

#include <string>
#include <vector>
#include <fstream>
#include <iostream>
#include <algorithm>
#include <functional>

static std::size_t dec = 0;

namespace my { class string
{
public:
    string( ) { }
#ifdef COW
    string( const std::string& ref ) : str( ref ), val( dec % 2 ? - ++dec : ++dec ) {
#else
    string( const std::string& ref ) : val( dec % 2 ? - ++dec : ++dec ) {
        str.resize( ref.size( ) );
        std::copy( ref.begin( ), ref.end( ), str.begin( ) );
#endif
    }

    bool operator<( const string& other ) const { return val < other.val; }

private:
#ifdef COW
    std::string str;
#else
    std::vector< char > str;
#endif
    std::size_t val;
}; }


template< typename T >
void dup_vector( T& vec )
{
    T v = vec;
    for ( typename T::iterator i = v.begin( ); i != v.end( ); ++i )
#ifdef CPP11
        vec.push_back( std::move( *i ) );
#else
        vec.push_back( *i );
#endif
}

int main( )
{
    std::ifstream file;
    file.open( "/etc/passwd" );
    std::vector< my::string > lines;
    while ( ! file.eof( ) )
    {
        std::string s;
        std::getline( file, s );
        lines.push_back( s + s + s + s + s + s + s + s + s );
    }

    while ( lines.size( ) < ( 1000 * 1000 ) )
        dup_vector( lines );
    std::cout << lines.size( ) << " elements" << std::endl;

    std::sort( lines.begin( ), lines.end( ) );

    return 0;
}

What this does is read /etc/passwd into a vector of lines, then duplicating this vector onto itself over and over until we have at least 1 million entries. This is where the first optimization should be useful, not only the explicit std::move() you see in dup_vector(), but also the push_back per se should perform better when it needs to resize (create new + copy) the inner array.

Finally, the vector is sorted. This should definitely be faster when you don't need to copy temporary objects each time two elements are swapped.

I compile and run this two ways, one being as C++98, the next as C++11 (with -DCPP11 for the explicit move):

1> $ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out
2> $ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
3> $ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out
4> $ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out

With the following results (twice for each compilation):

GCC C++98
1> real 0m9.626s
1> real 0m9.709s

GCC C++11
2> real 0m10.163s
2> real 0m10.130s

So, it's slightly slower to run when compiled as C++11 code. Similar results goes for clang:

clang C++98
3> real 0m8.906s
3> real 0m8.750s

clang C++11
4> real 0m8.858s
4> real 0m9.053s

Can someone tell me why this is? Are the compilers optimizing so good even when compiling for pre-C++11, that they practically reach move semantic behaviour after all? If I add -O2, all code runs faster, but the results between the different standards are almost the same as above.

EDIT: New results with my::string and rather than std::string, and larger individual strings:

$ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out
real    0m16.637s
$ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
real    0m17.169s
$ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out
real    0m16.222s
$ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
real    0m15.652s

There are very small differences between C++98 and C+11 with move semantics. Slightly slower with C++11 with GCC and slightly faster with clang, but still very small differencies.

EDIT2: Now without std::string's COW, the performance improvement is huge:

$ rm -f a.out ; g++ --std=c++98 test.cpp ; time ./a.out
real    0m10.313s
$ rm -f a.out ; g++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
real    0m5.267s
$ rm -f a.out ; clang++ --std=c++98 test.cpp ; time ./a.out
real    0m10.218s
$ rm -f a.out ; clang++ --std=c++11 -DCPP11 test.cpp ; time ./a.out
real    0m3.376s

With optimization, the difference is a lot bigger too:

$ rm -f a.out ; g++ -O2 --std=c++98 test.cpp ; time ./a.out
real    0m5.243s
$ rm -f a.out ; g++ -O2 --std=c++11 -DCPP11 test.cpp ; time ./a.out
real    0m0.803s
$ rm -f a.out ; clang++ -O2 --std=c++98 test.cpp ; time ./a.out
real    0m5.248s
$ rm -f a.out ; clang++ -O2 --std=c++11 -DCPP11 test.cpp ; time ./a.out
real    0m0.785s

Above showing a factor of ~6-7 times faster with C++11.

Thanks for the great comments and answers. I hope this post will be useful and interesting to others too.

gustaf r
  • 1,224
  • 9
  • 14
  • 1
    have you enabled optimizations? (/O3 switch) – Andy Prowl Jan 12 '13 at 12:07
  • @chris, I'm not using it after. – gustaf r Jan 12 '13 at 12:09
  • @gustafr, Never mind, I thought you were moving your main vector into the local one for some reason. – chris Jan 12 '13 at 12:10
  • @AndyProwl, as I wrote, I tested with -O2. Tested now with -O3 too, same thing. – gustaf r Jan 12 '13 at 12:10
  • 3
    this doesn't make sense. how exactly do you expect move semantics to improve the speed of *copying* – Cheers and hth. - Alf Jan 12 '13 at 12:13
  • The algorithm is needlessly long. Try `std::move(v.begin(), v.end(), std::back_inserter(vec));`. (Or `std::copy` in C++03.) Also prepend a `reserve()` call maybe. – Kerrek SB Jan 12 '13 at 12:13
  • @KerrekSB, Are you sure that form of `std::move` exists? – chris Jan 12 '13 at 12:15
  • @Cheersandhth.-Alf, in many ways. When appending (copying) a vector onto itself as I do, the vector needs to resize. What happens is, it creates a new larger inner array, and copies the values of the smaller array into the larger array. In C++11, with move semantics, STL will **move** the string objects, hence not re-allocate all inner dynamically allocated string buffers. – gustaf r Jan 12 '13 at 12:15
  • @KerrekSB, that's a bit OT, and regardless, please elaborate on what you think causes the result. – gustaf r Jan 12 '13 at 12:17
  • I'm simply not reproducing your results. C++11 mode times are slightly better for me at `-O2` for two versions of g++ and one of clang. – Mat Jan 12 '13 at 12:18
  • @Mat, yes, with -O2 C++11 isn't "a little bit slower", it's now "a little bit faster", but only marginally, and cannot really cover for the unnecessary copies, I assume. – gustaf r Jan 12 '13 at 12:20
  • COW strings probably is the issue... – Mat Jan 12 '13 at 12:25
  • 2
    @gustaf: hm, OK, that's your expectation. there are at least three reasons why it's not happening: (1) building as C++98 does not necessarily use a separate C++98 standard library implementation, (2) with g++ you have reference counted strings (COW), which means that actual copying is avoided also for the C++98 case, and (3) not sure if this kicks in for g++ but I think it would for Visual C++, the small buffer optimization may cause the same code to be emitted for move and copy cases, if the strings are short enough. – Cheers and hth. - Alf Jan 12 '13 at 12:34
  • You should define such a move constructor using `= default`, or remember to declare it `noexcept`. Removing the copy and move constructors and using implicit declarations would be preferable. – Potatoswatter Jan 12 '13 at 13:20
  • (Leaving out `noexcept` can cause it to be ignored, such as when moving or expanding a `vector`.) – Potatoswatter Jan 12 '13 at 13:26

3 Answers3

15

This should definitely be faster when you don't need to copy temporary objects each time two elements are swapped.

std::string has a swap member, so sort will already use that, and it's internal implementation will already be move semantics, effectively. And you won't see a difference between copy and move for std::string as long as SSO is involved. In addition, some versions of GCC still have a non-C++11-permitted COW-based implementation, which also would not see much difference between copy and move.

Puppy
  • 144,682
  • 38
  • 256
  • 465
  • Good point! However, I changed my code now and wrote my own `string` class without a `swap` method, and the results are, although better, still very minor. Will adapt the original question – gustaf r Jan 12 '13 at 12:43
  • You still don't account for the fact that some versions of GCC use a COW-based `std::string` implementation, which may make copies not, infact, copies. Consider using `std::vector` as the underlying storage. – Puppy Jan 12 '13 at 13:22
  • COW was the reason, well done @DeadMG. I adapted the code to use `std::vector` and the result became huge. – gustaf r Jan 12 '13 at 14:09
  • @gustafr, I did't saw your previous code, the cause of cow, although I may suppose how it was, you could leave it commented for reference where things improved. – oblitum Jan 12 '13 at 14:11
2

This is probably due to the small string optimization, which can occur (depending on the compiler) for strings shorter than e.g 16 characters. I would guess that all the lines in the file are quite short, since they are passwords.

When small string optimization is active for a particular string then move is done as a copy.

You will need to have larger strings to see any speed improvements with move semantics.

ronag
  • 49,529
  • 25
  • 126
  • 221
  • Interesting, but I'm reading the source code for GCC's STL and I can't find any such optimization either in the move constructor or its swap() method, so I'm not sure this is the reason. – gustaf r Jan 12 '13 at 12:26
  • Depends on the version, I think. – Puppy Jan 12 '13 at 12:31
  • @ronag, no, you didn't read it properly, GCC has always used COW not SSO. It doesn't depend on the version. It's never used SSO. – Jonathan Wakely Jan 13 '13 at 15:20
2

I think that you'll need to profile the program. Maybe most of the time is spent in the lines T v = vec; and the std::sort(..) of a vector of 20 million strings!!! Nothing to do with move semantics.

rodrigo
  • 94,151
  • 12
  • 143
  • 190
  • +1 the OP should write code to test exactly those points where C++11 differs from c++98, ie where it benefits from moving – stijn Jan 12 '13 at 12:33
  • Your idea sounds reasonable at first glance, but remember I force a **move** of each element into the main vector, when using C++11. So this should also enjoy be a major improvement in 11. – gustaf r Jan 12 '13 at 12:33
  • @gustaf r: The sort takes 95% of the time in your code. So what you do in dup_vector is irrelevant. – Timo Jan 12 '13 at 12:35
  • And for the sort itself, it might well be that more time is spent comparing strings than actually swapping them. Particularly as many of them will be identical. – MvG Jan 12 '13 at 12:37
  • @Timo, I wanted to follow http://sscce.org/ and since both the vector copy as well as the sort should benefit from move semantics, it might be pedantic, but at least not opposing the purpose. – gustaf r Jan 12 '13 at 12:51