39

This really is a question just for my own interest I haven't been able to determine through the documentation.

I see on http://www.cplusplus.com/reference/string/string/ that append has complexity:

"Unspecified, but generally up to linear in the new string length."

while push_back() has complexity:

"Unspecified; Generally amortized constant, but up to linear in the new string length."

As a toy example, suppose I wanted to append the characters "foo" to a string. Would

myString.push_back('f');
myString.push_back('o');
myString.push_back('o');

and

myString.append("foo");

amount to exactly the same thing? Or is there any difference? You might figure that append would be more efficient because the compiler would know how much memory is required to extend the string the specified number of characters, while push_back may need to secure memory each call?

Memento Mori
  • 3,327
  • 2
  • 22
  • 29
  • Besides the obvious (the former are three function calls and latter just one), the former could cause three reallocations. Personally I would rather use `myString += "foo";` as I think it's more "natural", even though it's the same as the `append` call. – Some programmer dude Feb 26 '13 at 05:44
  • Key words: "up to". Though it's unfortunate that they didn't state the typical case for `append`, only the worst. – Benjamin Lindley Feb 26 '13 at 05:48
  • 1
    @JoachimPileborg: `push_back` must be amortized constant time, so the likelihood of three allocations for every implementation of which I am aware is zero. Most start with some reasonable size (greater than 1 or 2) and grow the underlying buffer geometrically (by a factor of 1.5 or 2 times per size increase). – Billy ONeal Feb 26 '13 at 05:52

4 Answers4

47

In C++03 (for which most of "cplusplus.com"'s documentation is written), the complexities were unspecified because library implementers were allowed to do Copy-On-Write or "rope-style" internal representations for strings. For instance, a COW implementation might require copying the entire string if a character is modified and there is sharing going on.

In C++11, COW and rope implementations are banned. You should expect constant amortized time per character added or linear amortized time in the number of characters added for appending to a string at the end. Implementers may still do relatively crazy things with strings (in comparison to, say std::vector), but most implementations are going to be limited to things like the "small string optimization".

In comparing push_back and append, push_back deprives the underlying implementation of potentially useful length information which it might use to preallocate space. On the other hand, append requires that an implementation walk over the input twice in order to find that length, so the performance gain or loss is going to depend on a number of unknowable factors such as the length of the string before you attempt the append. That said, the difference is probably extremely Extremely EXTREMELY small. Go with append for this -- it is far more readable.

Billy ONeal
  • 104,103
  • 58
  • 317
  • 552
  • 6
    However, if you just want to append 1 char, there is no `append(char)` overload. In that case `push_back()` can be used. – rustyx May 19 '16 at 12:31
  • @rustyx sure, but that would be a different question :) – Billy ONeal May 19 '16 at 13:54
  • Do you know if the C++11/14/17 standard(s) ever tightened the complexity requirements on `push_back` / `append` / `insert` for strings? If not, are you sure most existing implementations are so friendly? (I vaguely recall having problems along these lines in the past, but perhaps that was just a consequence of some CoW implementation.) – Nemo Aug 17 '16 at 17:38
  • @Nemo: I'm not sure what "tightening" you could ask for. They've always been amortized constant time as far as I know. Currently it's N4606 23.2.3 \[sequence.reqmts\] /16 http://eel.is/c++draft/sequence.reqmts#16 – Billy ONeal Aug 17 '16 at 20:41
  • @BillyONeal: I meant relative to C++03, where `push_back` etc. on strings had essentially no complexity guarantee (unlike e.g. vector). I see this is corrected for `push_back` on strings in C++11, and thank you for the reference. But as far as I can tell, the standard does not impose any complexity requirements on `insert(end(),...)` nor `append()` for strings. So from a "strictly conforming" point of view, the sequence of `push_back` calls could be faster than `append` by a factor of O(n). Unless I am missing something. – Nemo Aug 17 '16 at 21:29
  • @Nemo: `push_back` is defined to be exactly the same as append: http://eel.is/c++draft/string::append#24 – Billy ONeal Aug 18 '16 at 00:10
  • @BillyONeal: That says "equivalent to", and the whole point of such wording is to permit a distinct (i.e. faster) implementation. Otherwise, it would say "calls", as elsewhere. So you just made the case even stronger that the standard does *not* impose the amortized constant complexity guarantee for `string::append`. Also compare to the wording for (e.g.) `vector::insert`, which is what this guarantee would look like if it were present. Perhaps this is a deficiency in the spec, but the meaning as written is quite clear, IMO. – Nemo Aug 18 '16 at 14:50
  • @Nemo IMO if it's clear what users and implementers need to do the spec is fine. There is no sane worse than O(n) implementation of append. Effects equivalent to can be faster; but usually only by a constant factor. When comparing n O(1) amortized push_back calls with 1 O(n) append call, the only answer is to profile. – Billy ONeal Aug 18 '16 at 15:04
  • Just so I'm clear...in C++11, std::string::append will be O(k) where k is the size of the string your appending to the existing one, not O(n+k) where n is the size of the pre existing string? – Dominic Farolino Sep 22 '16 at 05:07
  • @DomFarolino it is O(k) unless reallocation is necessary, yes. Reallocation is geometric, so you should get amortized O(k) overall. – Billy ONeal Sep 22 '16 at 13:37
  • Ok, I'm assuming C++11 strings reallocate similarly to how std::vector has always reallocated. Thanks so much! – Dominic Farolino Sep 22 '16 at 14:16
6

I had the same doubt, so I made a small test to check this (g++ 4.8.5 with C++11 profile on Linux, Intel, 64 bit under VmWare Fusion).

And the result is interesting:

push :19
append :21
++++ :34

Could be possible this is because of the string length (big), but the operator + is very expensive compared with the push_back and the append.

Also it is interesting that when the operator only receives a character (not a string), it behaves very similar to the push_back.

For not to depend on pre-allocated variables, each cycle is defined in a different scope.

Note : the vCounter simply uses gettimeofday to compare the differences.

TimeCounter vCounter;

{
    string vTest;

    vCounter.start();
    for (int vIdx=0;vIdx<1000000;vIdx++) {
        vTest.push_back('a');
        vTest.push_back('b');
        vTest.push_back('c');
    }
    vCounter.stop();
    cout << "push :" << vCounter.elapsed() << endl;
}

{
    string vTest;

    vCounter.start();
    for (int vIdx=0;vIdx<1000000;vIdx++) {
        vTest.append("abc");
    }
    vCounter.stop();
    cout << "append :" << vCounter.elapsed() << endl;
}

{
    string vTest;

    vCounter.start();
    for (int vIdx=0;vIdx<1000000;vIdx++) {
        vTest += 'a';
        vTest += 'b';
        vTest += 'c';
    }
    vCounter.stop();
    cout << "++++ :" << vCounter.elapsed() << endl;
}
user2583872
  • 99
  • 1
  • 2
4

Add one more opinion here.

I personally consider it better to use push_back() when adding characters one by one from another string. For instance:

string FilterAlpha(const string& s) {
  string new_s;
  for (auto& it: s) {
    if (isalpha(it)) new_s.push_back(it);
  }
  return new_s;
}

If using append()here, I would replace push_back(it) with append(1,it), which is not that readable to me.

0

Yes, I would also expect append() to perform better for the reasons you gave, and in a situation where you need to append a string, using append() (or operator+=) is certainly preferable (not least also because the code is much more readable).

But what the Standard specifies is the complexity of the operation. And that is generally linear even for append(), because ultimately each character of the string being appended (and possible all characters, if reallocation occurs) needs to be copied (this is true even if memcpy or similar are used).

jogojapan
  • 68,383
  • 11
  • 101
  • 131
  • Each character of the *appended* string needs to be copied. If the underlying memory block need not be resized then the string's existing contents need not be copied. – Billy ONeal Feb 26 '13 at 05:47