3

In this Stack Overflow question (which has unfortunately been put on hold since I began researching for this question), several people have mentioned that "in modern C++," due to move semantics, there's no need for the compiled code to copy the string for the operation string = string + s1, leaving the impression that with a modern C++ compiler, string = string + s1 can be just as efficient as string += s1. I find that claim dubious, but I'm working in the legacy world of C++03 and still don't know much about move semantics. (It's my job, and I don't pick our compiler.)

Cost of string += s1

With the operation string += s1, no new allocation is needed unless the expansion of string's buffer exceeds its previously allocated capacity, and assuming a reasonable implementation of the string class, no temporary object is ever created in the operation string += s1. Assuming the size of the result fits within that previously allocated capacity, the most costly part of string += s1 is the appending (copying) of s1's content to the end of string's original content using previously allocated but unused space. Also note that the cost of that copy operation is just the number of bytes of s1, not the combined total number of bytes.

Cost of string = string + s1 in legacy C++ (C++03 & earlier)

In legacy C++ (03 and earlier), string = string + s1 would, in my understanding, require at least one temporary allocation (for the evaluation of string + s1), and two full copies of the sum of the number of bytes in s1 and the original string (1. copy original content of string to the temporary and copy-append the bytes of s1 to the end of those bytes in the temporary, and then 2. copy all of the resulting bytes from the temporary back to string's buffer, including the bytes of the original content which were already there anyway). Obviously that's far more expensive than the cost of string += 1 described above, and it could make a significant difference especially if the append operation is performed many times in a loop (it's the Shlemiel the painter algorithm, except it's even worse than the inefficiency of strcat()!).

Cost of string = string + s1 in "modern" C++ (C++11 (or C++14?) & later)

It seems to me that the evaluation of the expression string + s1 will produce an rvalue, which can subsequently be provided as an rvalue reference to string's move assignment operator, eliminating the need to copy the result of string + s1 back into string. But that doesn't eliminate the need to create the original temporary object, and the associated copying for it, does it? In my thinking, the best that move semantics can do is eliminate one of the two full copy operations. One allocation and one copy operation (to create the temporary) is still required, right? If so, then move semantics only makes code like that "less bad", it doesn't stop it from being another instance of the Shlemiel the painter algorithm, and it's still worse than C's strcat()!

Please tell me I'm wrong. Also: Please don't speculate. That's not helpful. If your answer or comment begins with "I think...", then please don't submit it.

phonetagger
  • 7,701
  • 3
  • 31
  • 55
  • Because of the [as-if rule](https://en.cppreference.com/w/cpp/language/as_if) you *might* be wrong for certain implementations. – François Andrieux Jul 22 '19 at 18:01
  • 1
    Another thing to watch out for is SSO (Small String Optimization). Since SSO is allowed in C++11+, when you move a string you might make a copy since you can't move arrays. – NathanOliver Jul 22 '19 at 18:02
  • "But that doesn't eliminate the need to create the original temporary object, and the associated copying for it, does it?" `string + s1` will still need to get evaluated and create a temporary object (I think), but it might remove the need for a a copy into the assignment operator. I couldn't tell you for sure, though. That's why there's other people smarter than me here :). –  Jul 22 '19 at 18:12

3 Answers3

1

One allocation and one copy operation (to create the temporary) is still required, right? If so, then move semantics only makes code like that "less bad"

As far as I can tell, you're right. Move semantics doesn't get rid of all overhead from this example, only some.

If string doesn't have sufficient capacity, then there is no way of avoiding reallocation and copying of all characters. In case there is sufficient capacity however, for best performance, regardless of standard version, what you should do is:

string.append(s1); // this
string += s1;      // or this

If capacity is sufficient, no reallocation happens, and no characters of original string are copied. Same as strcat. Unlike strcat, the behaviour is well defined if he capacity is insufficient.


When doing in a loop, ideally the full capacity should be reserved before the loop. If the resulting length cannot be known, then there's a potential problem: Standard doesn't specify the capacity growth strategy of append as far as I know. If it does geometric growth, then trivial loop with append is good. If it only allocates exactly old length + append length with no space overhead, then trivial append loop would result in allocation in every iteration.

I tested libstdc++, and it does geometric growth. If you're paranoid, then to guarantee this, you technically need to explicitly check if reallocation would happen on each iteration, and reserve a geometrically growing amount of memory:

// paranoid mode: enabled
int growth_rate = 2;
for(auto& s1 : range) {
    auto new_size = string.size() + s1.size();
    auto new_cap = string.capacity();
    while (new_cap < new_size)
        new_cap *= growth_rate;
    string.reserve(new_cap);
    string += s1;
}

In this particular case however, std::ostringstream might be simpler.

eerorika
  • 232,697
  • 12
  • 197
  • 326
  • 1
    @NathanOliver That's why I said `ideally the full capacity should be reserved before the loop. If the resulting length cannot be known, then ...` Consider for example a case where the `range` is something that can be iterated exactly once. – eerorika Jul 22 '19 at 18:48
0

For the purpose of this answer I'm ignoring SSO (Small String Optimization) and assuming all string sizes will be large enough to require dynamic memory allocation. This lets us get a more apples to apples comparison.

Lets look at string += s1. When you do this the string will resize or not depending if there is enough free space, and then copy the contents of s1 into string. This means you could have up to O(N+M) character copies and an allocation. This is as good as you can get.

Now we'll examine C++03 string = string + s1. Here you would create a temporary string, preform an allocation for the buffer, and then if string does not have enough capacity you have another allocation, plus a copy from the temporary. This means you have at worst case 2 allocations and O(2(N+M)) character copies. This is no where near as good as the first case.

Lastly we have C++11+ string = string + s1. Again we have to create a temporary, do an allocation and copy string and s1 into it. Unlike C++03, we don't have to copy back into string since we have move semantics. This means at worst you have an allocation, O(N+M) character copies, plus some copying/swapping of the temporary string members with string. This is not as good as the first case, but it is a lot better then C++03.

What does all this mean? Use string += s1. It is the short form of string.append(s1) and offers the best performance as no temporary needs to be created.

NathanOliver
  • 171,901
  • 28
  • 288
  • 402
0
std::string s0 = "whatever length";
std::string s1 = "whatever length";

std::string s = s0+s1;

this requires zero temporaries.

s = s0+s1;

this requires 1 temporary, but there are no observable effects from that temporary, so compilers are free to eliminate its creation. This is challenging; I am unaware of a compiler that pulls it off in this case, but explicitly permitted by the standard.

s = s + s1;

same as previous case.

s += s1;

if the length of s and s1 are short enough, this involves no allocation.

std::string short_string = "short"; // so short that 2 copies will fit in SBO
std::string long_string = "some long string that won't fit in SBO";

s = short_string + short_string;

this can be done with zero memory allocations or copies.

s = s + short_string;

If s + short_string fits in the SBO, this can be done with zero memory allocations or copies. This is more challenging than the previous case.

s += short_string;

this is easy to optimize out the allocations if the sum is short, or if it fits in the buffer.


So there are a few things going on here.

First, C++ has given compilers explicit permission to eliminate new allocations. So if the compiler can understand the types well enough, it can see that the temporary new buffer is only used as a swap space, and it can work out that it doesn't need it under as-if rules.

The second thing going on is elision. Elision is what permits

std::string s = s0+s1;

the s0+s1 used to be a temporary object; now, it is a prvalue expression that must be merged with the std::string s object here under rules. There is no longer a separate object there.

prvalue expressions are sort of like portable construction instructions at this point. Certain operations can force them to materialize temporaries, but using them to construct a value of matching type does not materialize temporaries.

Prior to , even in , elision could occur here and the return value of s0+s1 could (and usually did) have its identity elided with std::string s; that is why the prvalue expression rules are often described as "guaranteed elision".


Now, something else of interest is:

SomeType s = s0 + s1 + s2 + s3 + s4 + s5;

if SomeType is cheap to move and operator+ is written right, this can involve zero wasted temporary buffers.

Suppose we have

SomeType operator+( SomeType lhs, SomeType const& rhs ) {
  lhs += rhs;
  return lhs;
}

and SomeType(SomeType&&) is so cheap it is free (as is ~SomeType on a moved-from SomeType).

Then

SomeType s = s0 + s1 + s2 + s3 + s4 + s5;

is the same as

SomeType s = s0;
s += s1;
s += s2;
s += s3;
s += s4;
s += s5;

down to the likely assembly code generated.

Now,

s = s0 + s1 + s2 + s3 + s4 + s5;

isn't quite as good as

s += s0; s += s1; s += s2; s += s3; s += s4; s += s5;

because the first cannot reuse storage already existing in s (assuming it is sufficient), while the second can.


TL;DR there are a few different s = s + s1; in some cases s += s1 is more efficient, and in others it is not.

In almost all cases the standard permits it to be equally efficient, but compilers are unlikely to do it. In some of the cases, it is both permitted to be equally efficient, and likely to be equally efficient.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524