1

In the C++ Standard std:string follows an exponential growth policy, therefore I suppose the capacity() of string during concatenation will always be increased when necessary. However, when I test test.cpp, I found that in the for-loop, only every two times will the capacity() be shrunk back to length() during assignment.

Why isn't this behavior depending on the length of string, but depending on how frequent I change the string? Is it some kind of optimization?

The following codes are tested with g++ -std=c++11.

test.cpp:

#include <iostream>  
int main(int argc, char **argv) {
  std::string s = "";
  for (int i = 1; i <= 1000; i++) {
    //s += "*";
    s = s + "*";
    std::cout << s.length() << " " << s.capacity() << std::endl;
  }
  return 0;
}

And the output will be like this:

1 1
2 2
3 4
4 4
5 8
6 6    // why is capacity shrunk?
7 12
8 8    // and again?
9 16
10 10  // and again?
11 20
12 12  // and again?
13 24
14 14  // and again?
15 28
16 16  // and again?
17 32
...
996 996
997 1992
998 998  // and again?
999 1996
1000 1000  // and again?
sleepsort
  • 1,321
  • 15
  • 28
  • Now I guess I don't understand your question. Is it "why would copy assignment ever reduce capacity?" If so, isn't the answer pretty obvious? – David Schwartz Jun 25 '14 at 03:32
  • @DavidSchwartz I don't know this previously, but abarnert tells us that "When copy-assigning from one string to another, there's no reason to copy the capacity" ? My problem is that, the copy assignment doesn't always reduce capacity, but only do that every two times. – sleepsort Jun 25 '14 at 03:36
  • You could simply look at the source code for your compiler's implementation of `std::string` and see why it does what it does. – Remy Lebeau Jun 25 '14 at 04:01

2 Answers2

2

When you do this:

s = s + "*";

You're doing two separate things: making a new temporary string, consisting of "*" concatenated onto the end of the contents s, and then copy-assigning that new string to s.

It's not the + that's shrinking, it's the =. When copy-assigning from one string to another, there's no reason to copy the capacity, just the actual used bytes.

Your commented-out code does this:

s += "*";

… is only doing one thing, appending "*" onto the end of s. So, there's nowhere for the "optimization" to happen (if it happened, it would be a pessimization, defeating the entire purpose of the exponential growth).

abarnert
  • 354,177
  • 51
  • 601
  • 671
  • "When copy-assigning from one string to another, there's no reason to copy the capacity, just the actual used bytes." - that doesn't actually account for the observed behaviour. For C++11 it makes sense for a move assignment to swap buffers, so the assigned-to object takes on the capacity of the temporary; for C++03 the assignment *could* copy used bytes, but then you wouldn't expect the capacity to be reduced - that would only happen for implementations using reference counting that discard the current larger-capacity buffer (i.e. *not* doing your "copy [of] actual used bytes". – Tony Delroy Jun 25 '14 at 05:24
  • Anyway - +1 for pointing out the crucial difference between `+` and `+=` and the importance of the temporary. Cheers. – Tony Delroy Jun 25 '14 at 05:26
1

It's actually not convered by the C++ standard what happens to capacity() when strings are moved, assigned, etc. This could be a defect. The only constraints are those derivable from the time complexity specified for the operation.

See here for similar discussion about vectors.

Community
  • 1
  • 1
M.M
  • 138,810
  • 21
  • 208
  • 365