Performance difference between string concat by using str += "A" or str = str + "A"

Question

I want to know why str += "A" and str = str + "A" have different performances.

In practice,

string str = "cool"
for(int i = 0; i < 1000000; i++) str += "A" //  -> approximately 15000~20000 ms
for(int i = 0; i < 1000000; i++) str = str + "A"  // -> 10000000 ms

According to what I've looked for, str = str + "A" have to copy 'str' so each iteration concatenation needs O(N) so if you iterate N times, time complexity is O(N^2).

However, str += "A" is only concatenating "A" to 'str' so it doesn't need to do any copy. So if iterate N times, only O(N).

I think it's weird. When I inspected the assembly code, they were the same. I think compiler translates str += "A" to str = str + "A" so the result should be the same.

If you look at the actual results, I'm wrong, but can you tell where I'm wrong?

Here is my ref link.

Here's my code for the timings:

#include<bits/stdc++.h>
#include<chrono>
using namespace std;
using namespace chrono;
int main(void)
{
    ios::sync_with_stdio(0);
    cin.tie(0);     
    string str = "aaaa";
    system_clock::time_point start = system_clock::now();
    
    for(int i = 0; i < 100000; i++)
    {
        str = str +  "aaaa";
    }

    system_clock::time_point end = system_clock::now();
    microseconds microSec = duration_cast<microseconds>(end-start);
    cout << microSec.count() << " ms\n";
    
    
    return 0;
}

=========================== my test code. thx

umm. no just g++ -S test.cpp, but i don`t know default optimization is applied. — GyuMin Han, Aug 28 '21 at 08:54
Optimization is not enabled by default. Non-optimized code can't be compared for performance. But if the assembly is the same, then the performance should be the same, so make sure you show how you confirm this and your timings. — Ted Klein Bergman, Aug 28 '21 at 08:56
Without optimisations [gcc](https://gcc.godbolt.org/z/WWo6PshrM) doesn't seem to generate the same assembly, at least on godbolt — Lala5th, Aug 28 '21 at 08:57
Even for short and simple examples, [don't include ``](https://stackoverflow.com/questions/31816095/why-should-i-not-include-bits-stdc-h). — Some programmer dude, Aug 28 '21 at 09:05
@GyuMinHan I still don't see the same code in assembly. Most of it's just noise though (i.e. the benchmarking code is generated the same), but the for loop is clearly different. Also https://stackoverflow.com/Questions/31816095/Why-Should-I-Not-Include-Bits-Stdc-H. and https://stackoverflow.com/questions/1452721/why-is-using-namespace-std-considered-bad-practice. Also if you cast to `microseconds`, use `us`, not `ms`. It's confusing at best — Lala5th, Aug 28 '21 at 09:05
Also have a look here : https://stackoverflow.com/questions/18892281/most-optimized-way-of-concatenation-in-strings/18892355 — Pepijn Kramer, Aug 28 '21 at 10:07

Quimby · Answer 1 · 2021-08-28T11:28:03.070

4

i think compiler translates str += "A" to str = str + "A" so the result have to be same.

No, operator+= is a separate function. ~~It's not like in Python - x += y does not translate to x = x + y~~. That's a good thing because it allows the optimization which makes the performance difference you observed. I doubt the assembly is the same, it's two separate functions with different implementations.

operator+= - No complexity, but reasonably it will be amortized O(1) per the copied character.
operator+ O(n) at best obviously.

In x = x+y the right side is evaluated first and so the compiler must allocate a new string object that is then moved into x and the old string in x is thus wastefully discarded. Under as-if rule the compiler is free to replace x = x+y with x+=y but it's hard to tell when or even if it does that.

edited Aug 28 '21 at 11:28

answered Aug 28 '21 at 09:06

Quimby

17,735
4
35
55

1

In Python, `+=` and `+` are also two different methods. – Ted Klein Bergman Aug 28 '21 at 09:07
@TedKleinBergman Hmm, you seem to be [correct](https://stackoverflow.com/questions/15376509/when-is-i-x-different-from-i-i-x-in-python), I was confused that it uses `__add__` if you do not explicitly implement `__iadd__`. Well that's some bad Python I have wrote then... – Quimby Aug 28 '21 at 09:09
I think my assumption is wrong from the beginning. thank you!!! – GyuMin Han Aug 28 '21 at 09:40

Dorin Baba · Accepted Answer · 2021-08-28T11:36:59.263

In very short terms, str += "A" will result in less assembly being generated (x86-64 gcc 11.2):

lea     rax, [rbp-96]
mov     esi, OFFSET FLAT:.LC1
mov     rdi, rax
call    std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator+=(char const*)

While str = str + "A" will result in this:

lea     rax, [rbp-48]
lea     rcx, [rbp-96]
mov     edx, OFFSET FLAT:.LC1
mov     rsi, rcx
mov     rdi, rax
call    std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > std::operator+<char, std::char_traits<char>, std::allocator<char> >(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, char const*)
lea     rdx, [rbp-48]
lea     rax, [rbp-96]
mov     rsi, rdx
mov     rdi, rax
call    std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::operator=(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >&&)
lea     rax, [rbp-48]
mov     rdi, rax
call    std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() [complete object destructor]

I'm not an assembly guru, so don't take all I tell you here for granted because I'm not entirely sure that's how it works

The += case (provided by @prapin)

It will just copy bytes into the string buffer if there is enough place left. Otherwise, it will reallocate the buffer with some algorithm (typically doubling it)

But in the second case, there are multiple expensive operations:

a clone of str is created
a new object is created with the value of str and "A" appended,
the reference from the 1st point now is switched to the reference of the object from 2nd point
now we got a reference to an object having the value of str + "A". This now becames initial's str reference
the initial str object ("cool") is destroyed

Let me know in the comments if I'm mistaken

While this is true the biggest performance hit (maybe) wouldn't be the amount of assembly. When `str = str + "A"` is called a copy is created, which would (if I remember correctly) call an allocation function, that generally make a system call, which are very expensive computationally — Lala5th, Aug 28 '21 at 09:17
@Lala5th yes, basically this is what the assembly from the above tells us. I agree that the amount of assembly is not tightly coupled to performance, but in this case it's easy to spot that in 2nd case, there are more expensive operations than in the first :) — Dorin Baba, Aug 28 '21 at 09:21
thank you clearly understand. it`s intuitive. i don`t know why my assembly are the same.. — GyuMin Han, Aug 28 '21 at 09:27
@GyuMinHan I don't think you included all the relevant parts to the assembly, in your code. The setup of main shouldn't really change much. To see the difference you have to look at all of main (i.e. until `ret`) — Lala5th, Aug 28 '21 at 09:56
*A new object in created in memory*. No. The operator `+=` does **not** create a new object. It will just copy bytes into the string buffer if there is enough place left. Otherwise, it will reallocate the buffer with some algorithm (typically doubling it), so that on average, the complexity is O(n). — prapin, Aug 28 '21 at 11:30
@Lala5th OMG, Yeah you`re right. i found difference between two assembly codes. it looks very similar this img. 1 call vs 3 calls and more assembly lines! thank you!!! — GyuMin Han, Aug 28 '21 at 13:07

score 1 · Answer 3 · answered Aug 28 '21 at 09:06

Essentially the first one has to copy the whole string to a new location together with the string to append, and store that pointer in the original string. We mainly care about the copying around of stuff, since that, together with memory allocation, is what really takes time.

My best guess is that the second one is doing all the same steps as the first one, but also copying the temporary's contents again. This doesn't make much sense though, as str = str + "A"; should call the move assignment operator (operator=(string&&)) and any developer worth their salt (so definitely someone writing a standard library implementation) would use resource stealing (switching the pointers, since this is a move, and leaving the actual string contents where they are. This doesn't create ownership conflicts because the second object is being destroyed anyways, so it might as well delete the string to overwrite). Keep in mind that all of this disregards SSO (small string optimization), but it shouldn't matter too much if your strings get to 1M+ characters. Also, this disregards compiler optimizations, which, if any, will probably make much more of a difference.

TYeung · Answer 4 · 2021-08-28T09:32:29.677

+= and + are different operators and do not necessarily do the same thing.

Even x = x + y could generates different assembly code than x += y with no compiler optimization.

In this case, str += 'A' only appends 'A' to str but str = str + 'A' makes another temporary string to store str + 'A' and copies it to str which is slower.

However, if you have compiler optimisation on(e.g. O3 with g++), the compiler might optimise it out.

Performance difference between string concat by using str += "A" or str = str + "A"

4 Answers4