8

Is it actually guaranteed anywhere that the following reduce-capacity trick will "work"?

int main() {
   std::string s = "lololololol";
   s = "";                        // capacity still non-zero

   string(s).swap(s);             // ?
}

It doesn't seem to "work" for me (in that the capacity remains non-zero), and I can't find anything in the standard that says anything more than that the "contents" must be swapped between the two [here, identical] objects.

Similarly, for sequence containers:

int main() {
   vector<int> v { 1,2,3,4,5 };
   v.clear();                   // capacity still non-zero

   vector<int>(v).swap(v);      // ?
}

As far as I'm aware, this "trick" is semi-widely used; perhaps this widespread adoption is misguided?

(Of course, in C++11 we have shrink_to_fit [albeit non-binding] instead, making this kind of moot.)

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 3
    You might be looking at Small String Optimization. What is the capacity reported when you say 'in that the capacity remains non-zero'? Aw, of course that doesn't go for the vector – sehe Oct 19 '11 at 22:34
  • @sehe: 11 for the string. It just stays the same. (In my tests it _did_ work for the vector, though.) Looking for some standard rationale for what we might or might not be able to expect here in the general case. – Lightness Races in Orbit Oct 19 '11 at 22:38
  • 1
    If that is your exact code, you are swapping the string with itself. – UncleBens Oct 19 '11 at 22:40
  • @UncleBens: Yes, I am aware of that. It's an old shrink-to-fit technique. – Lightness Races in Orbit Oct 19 '11 at 22:41
  • 2
    @UncleBens: He's constructing a temporary and swapping with that. (The idea being that the copy-constructor will allocate minimally.) – GManNickG Oct 19 '11 at 22:43
  • @GMan: Looks pointless though, and `string().swap(s);` does work (with GCC) (http://ideone.com/GFEOO) – UncleBens Oct 19 '11 at 22:46
  • @UncleBens: Tomalak's way works with smaller-not-not-empty containers as well. – Mooing Duck Oct 19 '11 at 22:47
  • It would be cool if we could get [Herb Sutter](http://stackoverflow.com/users/297582/herb-sutter) himself to weigh in on this. – Fred Larson Oct 19 '11 at 22:56
  • 1
    I suspect that GCC's strings are copy-on-write, hence the implementation realizes you are not really making a copy and hence capacity is not reduced this way. See http://gcc.gnu.org/onlinedocs/libstdc++/manual/strings.html#strings.string.shrink – UncleBens Oct 19 '11 at 23:05
  • 2
    @UncleBens : `string().swap(s);` would cause `s` to become empty, which is clearly not the intent. – ildjarn Oct 19 '11 at 23:11
  • 'at' Fred: we could write a comment as a polite invitation to @HerbSutter to come watch this question. _Dixerat_ – sehe Oct 19 '11 at 23:54
  • @UncleBens: I believe I read that COW implementation of `std::string` is now explicitely prohibited by the (new) standard. Too many problems/inevitable performance surprises with existing libraries that had it (Dinkumware, anyone?) – sehe Oct 19 '11 at 23:56
  • @sehe: FYI, if your comment has a backtick in it then you can reply-notify multiple people. :) – Lightness Races in Orbit Oct 20 '11 at 00:00
  • @sehe: (FYI, turns out you'll be able to post the comment, but only the first person will _actually_ get notified >.<) – Lightness Races in Orbit Feb 21 '12 at 17:26
  • @LightnessRacesinOrbit Well perhaps someday it did work: Jeff: [_`If you wish to bypass this, be sure the comment contains a backtick`_](http://meta.math.stackexchange.com/a/2615) but it appears you are right – sehe Feb 21 '12 at 19:28

4 Answers4

6

I've always been taught that there is no guaranteed standard way to lower the capacity. All methods have been (and still are) implementation defined.

§ 23.2.1\8 says:

The expression a.swap(b), for containers a and b of a standard container type other than array, shall exchange the values of a and b without invoking any move, copy, or swap operations on the individual container elements...

This guarantees that the internal pointers of vectors must be swapped.
However, I cannot find anything that guarantee on the capacity of a newly created vector.

§ 21.4.2\1 says that one of the basic_string default constructor's post conditions is that capacity() returns an unspecified value.
§ 21.4.2\3 says that one of the basic_string copy constructor's post conditions is that capacity() returns a value at least as big as size().
§ 21.4.6.8\2 says that string::swap runs in constant time, which (effectively) requires that the internal pointers are swapped.

As far as I can tell, a conforming implementation could have string::max_size() { return 4;}, and swapping all internals from one buffer to another would therefore be constant time. (vector can't do that though)

Obviously, take this all with a grain of salt. I'm quoting from the C++ draft from Feb28,'11, and I can't find specifications for the vector's copy constructor. Also, not finding evidence for is not the same as finding evidence against.

Mooing Duck
  • 64,318
  • 19
  • 100
  • 158
4

On his errata page for "Effective STL," Scott Meyers notes:

When string implementations use reference counting, the swap trick using the copy constructor doesn't decrease the capacity, because the copy constructor isn't allocating any memory; it's just adjusting a reference count. A more reliable way to perform shrink-to-fit is to create the temporary string via the range constructor, e.g., string(s.begin(), s.end()).swap(s); This version of the swap trick is safer for vectors, too, because it eliminates any chance that the copy constructor will copy the other vector's excess capacity (which implementations are permitted to do).

As for the 'guarantee,' Meyers notes:

The language police require that I inform you that there's no guarantee that this technique will truly eliminate excess capacity. Implementers are free to give vectors and strings excess capacity if they want to, and sometimes they want to. [Effective STL, Item 17]

Gnawme
  • 2,321
  • 1
  • 15
  • 21
2

From http://www.gotw.ca/gotw/054.htm:

Some implementations may choose to round up the capacity slightly to their next larger internal "chunk size," with the result that the capacity actually ends up being slightly larger than the size.

Fred Larson
  • 60,987
  • 18
  • 112
  • 174
  • If you can find the place in the standard that backs up those words, that'd be great. I can't, hence the question! – Lightness Races in Orbit Oct 19 '11 at 22:39
  • 1
    Mmmm... Paging dr. Sutter ... Paging dr. Sutter (isn't Herb Sutter standard enough? - oh well) – sehe Oct 19 '11 at 22:44
  • I guess the key is whether `swap()` actually swaps internals, as Herb says, so that the result has the capacity of the temporary rather than the original. – Fred Larson Oct 19 '11 at 22:48
  • But this quote concerns swapping with **default** container, not a copy of itself. The comment about the latter is: "(Some implementations may choose to round up the capacity slightly to their next larger internal "chunk size," with the result that the capacity actually ends up being slightly larger than the size.)" – UncleBens Oct 19 '11 at 22:49
  • Well, yes, but apparently it still doesn't have to do with chunk sizes here (http://ideone.com/i3HLd) - 13 (or 14) is hardly a chunk size. – UncleBens Oct 19 '11 at 23:08
  • This is still not a quote from the standard. – Lightness Races in Orbit Oct 19 '11 at 23:52
0

How this works is probably entirely implementation-defined. Unlike containers such as vector, strings can have very different implementations.

If the string implementation uses small string optimization, then you can't lower capacity beyond a certain threshold. If the string implementation uses copy-on-write, then no write occurs and no real copy is made.

According to http://www.gotw.ca/gotw/054.htm, shrink-to-fit and clear-completely are different tricks. If the intention is to clear completely, then swapping with a default-constructed string can be expected to give better results.

UncleBens
  • 40,819
  • 6
  • 57
  • 90