14

(Disclaimer: I don't know what the C++ standard might say about this..I know, I'm horrible)

while operating on very large strings I noticed that std::string is using copy-on-write. I managed to write the smallest loop that would reproduce the observed behaviour and the following one, for instance, runs suspiciously fast:

#include <string>
using std::string;
int main(void) {
    string basestr(1024 * 1024 * 10, 'A');
    for (int i = 0; i < 100; i++) {
        string a_copy = basestr;
    }
}

when adding a write in the loop body a_copy[1] = 'B';, an actual copy apparently took place, and the program ran in 0.3s instead of a few milliseconds. 100 writes slowed it down by about 100 times.

But then it got weird. Some of my strings weren't written to, only read from, and this was not reflected in the execution time, which was almost exactly proportional to the number of operations on the strings. With some digging, I found that simply reading from a string still gave me that performance hit, so it led me to assume GNU STL strings are using copy-on-read (?).

#include <string>
using std::string;
int main(void) {
    string basestr(1024 * 1024 * 10, 'A');
    for (int i = 0; i < 100; i++) {
        string a_copy = basestr;
        a_copy[99]; // this also ran in 0.3s!
    }
}

After revelling in my discovery for a while, I found out that reading (with operator[]) from the base string also takes 0.3s for the entire toy program..I'm not 100% comfortable with this. Are STL strings indeed copy-on-read, or are they allowing copy-on-write at all? I'm led to think that operator[] has some safeguards against one who would keep the reference it returns and later write to it; is this really the case? If not, what is really happening? If someone can point to some relevant section in the C++ standard, that'd also be appreciated.

For reference, I'm using g++ (Ubuntu 4.4.3-4ubuntu5) 4.4.3, and the GNU STL.

Michael Foukarakis
  • 39,737
  • 6
  • 87
  • 123
  • As the answers below suggest this is probably more of a compiler question than a C++ standard question. Which compiler are you using? Have you tried different optimization settings? – Björn Pollex Nov 01 '10 at 08:17
  • More than the compiler I'd imagine that this has to do with the specific STL implementation that the OP is using. From a standards perspective I think Charles Bailey has already answered. – Raj Nov 01 '10 at 08:19
  • C++98/03 was intended to allow COW strings, but COW isn't required. Incidentally, std::string isn't part of the STL even though STL concepts were later applied to it. –  Nov 01 '10 at 08:23
  • @Ranju V: The execution times with various optimization levels (-O1 to 3, -Os) remain the same for each of the examples. – Michael Foukarakis Nov 01 '10 at 08:40
  • 1
    @Roger: That is very interesting! So, while it's not mandated by the standard, std::string may in fact use COW as it's allowed? – Michael Foukarakis Nov 01 '10 at 08:40
  • In C++, you can't have separate index operators for read and write (you can muddle through with proxy classes, but they give you more rare, more ugly trouble). It's as if the compiler can't tell apart a read and a write when using `[]`. – peterchen Nov 01 '10 at 08:42
  • 7
    Just wanted to note that copy on write is probably going to fade away in C++0x with the introduction of move semantics (makes COW obsolete for many typical use cases) and concurrency (makes COW potentially very inefficient due to synchronization issues). – fredoverflow Nov 01 '10 at 09:15

3 Answers3

14

C++ doesn't distinguish between the operator[] for reading and writing, but only the operator[] for const object and mutable (non-const) object. Since a_copy is mutable, the mutable operator[] will be chosen, which forces the copying because that operator returns a (mutable) reference.

If efficiency is a concern, you could cast the a_copy to a const string to force the const version of operator[] to be used, which won't make a copy of the internal buffer.

char f = static_cast<const string>(a_copy)[99];
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
  • I hadn't considered const-ness as a factor at all. Thanks for that. Efficiency is not quite my concern here, as much as the internals of GNU STL, I suppose. Know your tools and all that. :) – Michael Foukarakis Nov 01 '10 at 09:00
  • 2
    You should use `const_cast<>`(http://msdn.microsoft.com/en-us/library/bz6at95h(VS.80).aspx) for CV casting. – J-16 SDiZ Nov 04 '10 at 09:03
  • @J-16: No, no, you shouldn't. That cast is only useful for removing const, which is very rarely the correct thing to do. – Puppy Nov 04 '10 at 09:22
  • @DeadMG, no. see the (draft) standard : http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2010/n3126.pdf [5.2.11][expr.const.cast], footnote (73) say `const_cast is not limited to conversions that cast away a const-qualifer`. See 5.2.11#3 for this specific case. – J-16 SDiZ Nov 04 '10 at 09:32
  • @J-16: Except for the part where using the other casts can add const just fine and there's no reason to use const_cast over them in this scenario. The only reason to use const_cast is to cast away const. – Puppy Nov 04 '10 at 09:37
  • @DeadMG: J-16 is correct - your logic is flawed. The point is not that other casts can add const, the point is that those other casts can do even more than that, while const_cast<> can't. In casting, you generally try to use the least powerful casting operator applicable, so you get compiler warnings if the template argument implies any larger change than intended. – Tony Delroy Nov 05 '10 at 09:23
  • @Tony: Except that running the risk of removing const by accident is far more dangerous than getting an implicit conversion or somesuch. – Puppy Nov 05 '10 at 09:24
  • @DeadMG: 1) you can't remove const by accident when the thing isn't const to begin with. 2) adding and removing const is what const_cast<> is for - using it is the way the programmer confirms their intent, and safer than any less self-documenting and less-restrictive alternatives. – Tony Delroy Nov 08 '10 at 01:18
13

The C++ standard doesn't prohibit or mandate copy-on-write or any other implementation details for std::string. So long as the semantics and complexity requirements are met an implementation may choose whatever implementation strategy it likes.

Note that operator[] on a non-const string is effectively a "write" operation as it returns a reference that can be used to modify the string at any point up to the next operation that mutates the the string. No copies should be affected by such a modification.

Have you tried profiling one of these two?

const string a_copy = basestr;
a_copy[99];

Or

string a_copy = basestr;
const std::string& a_copy_ref = a_copy;
a_copy_ref[99];
CB Bailey
  • 755,051
  • 104
  • 632
  • 656
3

Try this code:

#include <iostream>
#include <iomanip>
#include <string>

using namespace std;

template<typename T>
void dump(std::ostream & ostr, const T & val)
{
    const unsigned char * cp = reinterpret_cast<const unsigned char *>(&val);
    for(int i=0; i<sizeof(T); i++)
        ostr
            << setw(2) << setfill('0') << hex << (int)cp[i] << ' ';
    ostr << endl;
}

int main(void) {
    string a = "hello world";
    string b = a;
    dump(cout,a);
    dump(cout,b);

    char c = b[0];

    dump(cout,a);
    dump(cout,b);
}

On GCC, this is the output I get:

3c 10 51 00
3c 10 51 00
3c 10 51 00
5c 10 51 00

Which would seem to indicate that yes, they are copy on read in this case.

Benjamin Lindley
  • 101,917
  • 9
  • 204
  • 274
  • I'm using your sample test code under GCC 12 (Windows), and it looks like the recent GCC, the variable `a` and `b` do not share the internal buffer for the std::string, see the result here: https://forums.wxwidgets.org/viewtopic.php?p=214933#p214933 – ollydbg23 Oct 07 '22 at 13:51