5

Consider the following example:

int main()
{
    string x = "hello";
    //copy constructor has been called here.
    string y(x);
    //c_str return const char*, but this usage is quite popular.
    char* temp = (char*)y.c_str();

    temp[0] = 'p';

    cout << "x = " << x << endl;
    cout << "y = " << y << endl;

    cin >> x;
    return 0;
}

Run it on visual studio compiler and on g++. When I did so, I got two different results.
in g++:

x = pello  
y = pello

In visual studio 2010:

x = hello  
y = pello

The reason for the diff is most likely that g++ std::string implementation uses COW (copy on write) techniques and visual studio does not.

Now the C++ standard (page 616 table 64) states with regards to string copy constructor

basic_string(const basic_string& str):

effects:
data() should "points at the first element of an allocated copy of the array whose first element is pointed at by str.data()"

Meaning COW is not allowed (at least to my understanding).
How can that be?
Does g++ meets std::string C++11 requirements?

Before C++11 this did not pose a big problem since c_str didn't return a pointer to the actual data the string object holds, so changing it didn't matter. But after the change this combination of COW + returning the actual pointer can and breaks old applications (applications that deserve it for bad coding but nevertheless).

Do you agree with me? If yes, can something be done? Does anyone have an idea about how to go at it in a very big old code environments (a clockwork rule to catch this would be nice).

Note that even without casting the constness away, one might cause invalidation of a pointer by calling c_str, saving the pointer and then calling non-const method (which will cause write).
Another example without casting the constness away:

int main()
{
    string x = "hello";
    //copy constructor has been called here.
    string y(x);

    //y[0] = 'p';

    //c_str return const char*, but this usage is quite popular.
    const char* temp = y.c_str();

    y[0] = 'p';

    //Now we expect "pello" because the standart says the pointer points to the actual data
    //but we will get "hello"
    cout << "temp = " << temp << endl; 



    return 0;
}
Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
buc030
  • 425
  • 3
  • 14
  • 7
    You're saying that using a pointer to constant data (and a pointer that is pretty temporary as well) as a pointer to non-constant data is "quite popular"? I find that hard to believe. In fact, attempting to modify constant data leads to *undefined behavior* (see e.g. [this reference](http://en.cppreference.com/w/cpp/string/basic_string/c_str)). – Some programmer dude Jan 29 '14 at 12:56
  • possible duplicate of [Legality of COW std::string implementation in C++11](http://stackoverflow.com/questions/12199710/legality-of-cow-stdstring-implementation-in-c11) – ecatmur Jan 29 '14 at 13:08
  • @ecatmur this is not a duplicate, I'm talking about a specific buggy implementation (g++). Well buggy to my opinion, what is yours? – buc030 Jan 29 '14 at 13:26
  • @JoachimPileborg yes this is quite popular in the real world of old software. (of course not straight forward like this, you do understand the code only show a point that g++ uses COW right?). Note that even without casting the constness away, one might cause invalidation of a pointer by calling c_str, saving the pointer and then calling non-const method (which will cause write). – buc030 Jan 29 '14 at 13:31
  • @sellibitze read the whole thing you don't have to cast the const away, the second example is without casting the const away. The const cast just shows the point, that g++ uses COW. Really I don't understand why would you say something like this without even understanding the issue. – buc030 Jan 29 '14 at 14:05
  • 2
    @buc030 the linked question fully answers this question; any COW implementation is invalid. There's no need to have separate questions for each COW implementation, especially questions that confuse the issue with `const` UB violations. – ecatmur Jan 29 '14 at 14:23
  • @ecatmur Look, its not about const UB violation :) I said it before there is example without const violation as well. This is about if g++ meets the requirements or not. I claimed it's not, but I wasn't sure (maybe I missed something). After all it's not every day that you come across a bug in STL. But anyways you guys seems to agree with me (that it's a bug) and "Lightness Races in Orbit" gave a nice answer that it is probably going to be fixed in 4.9 (or when vstring gets in). Thanks "Lightness Races in Orbit" for being the only one concentrating on the main issue. – buc030 Jan 29 '14 at 17:21
  • 1
    This is not "a bug in STL". It's a known compliance issue with the GNU implementation of the C++ Standard Library, called _libstdc++_. – Lightness Races in Orbit Jan 29 '14 at 17:39
  • @Lightness Races in Orbit really? :) – buc030 Jan 29 '14 at 17:57
  • @buc030: Yes, really. – Lightness Races in Orbit Jan 29 '14 at 18:05
  • _"Before C++11 this did not pose a big problem since `c_str` didn't return a pointer to the actual data the string object holds, so changing it didn't matter."_ This is incorrect, `c_str` always returned the actual data in all popular implementations. – Jonathan Wakely Jan 29 '14 at 18:43
  • @JonathanWakely Yes I thought like you as well, but I researched a bit before posting the question and I took this statement from: http://stackoverflow.com/questions/19757209/why-do-data-and-c-str-return-char-const-while-operator-returns-char – buc030 Jan 29 '14 at 18:52
  • 1
    @buc030, C++03 *allowed* `c_str()` to be a different pointer, but also allowed it to point to the actual data. You didn't research very well, read the comments on the top answer you linked to. In G++ it always pointed to the actual data, so there is no change in that respect. – Jonathan Wakely Jan 29 '14 at 18:54
  • So your statement is still incorrect. You have concluded that because a particular implementation was allowed (even though it was never used in practice) that you can assume it is always true. That's clearly nonsense. – Jonathan Wakely Jan 29 '14 at 18:58
  • @buc030: More than simply "thinking", Jonathan actually _works_ on this stuff. – Lightness Races in Orbit Jan 29 '14 at 19:00

4 Answers4

17

You're right that COW is disallowed. But GCC hasn't updated its implementation yet, allegedly due to ABI constraints. A new implementation, designed eventually to supplant the std::string implementation, can be found as ext/vstring.h.

A bug in libstdc++'s std::string, albeit not this one, is not going to make it into GCC 4.9; Jonathan indicates on the bug that it has only been fixed for vstring so far. My guess would be, then, that the COW issue would be resolved around the same time.

Despite all this, casting away constness then mutating is pretty much always a bad idea: though you're correct that this should in practice be safe with a fully C++11-compliant string implementation, you're making assumptions and this very problem proves that you cannot always rely on those assumptions to hold. So, while your code example may be "popular", it's popular in poor code, and shouldn't be written even now. And, of course, writing that in C++03 is flat-out incompetence!

Community
  • 1
  • 1
Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • You are right that casting away constness is a bad idea, but reality is that we develop in unclean environments, with very old code (that was written by who knows who) that does that bad stuff quite a lot. Also even without casting the constness away, one might cause invalidation of a pointer by calling c_str, saving the pointer and then calling non-const method (which will cause write). – buc030 Jan 29 '14 at 13:23
  • 6
    @buc030: Yes, of course, there are many other ways to cause UB. Old code is hardly an excuse, though: this was _never_ legal in C++. – Lightness Races in Orbit Jan 29 '14 at 13:43
  • Actually I just updated the Target Milestone for that bug earlier today, it won't be fixed for 4.9, and we'll still have COW strings in 4.9 – Jonathan Wakely Jan 29 '14 at 18:36
  • @JonathanWakely: I couldn't find much in the way of an authoritative reference for the non-compliance of strings in libstdc++, either on Bugzilla or in the C++11 status matrix; is there somewhere else I should be looking to enhance this answer? Or is my guess regarding that bug 53221 about on the money? – Lightness Races in Orbit Jan 29 '14 at 18:58
  • Coincidentally I'm already in the process of updating the C++11 status table to mention the non-conformance of our COW implementation and plan to regenerate the onlinedocs HTML pages Real Soon Now. – Jonathan Wakely Jan 29 '14 at 19:00
  • @JonathanWakely: Awesome. :-) – Lightness Races in Orbit Jan 29 '14 at 19:01
  • @LightnessRacesinOrbit, http://gcc.gnu.org/onlinedocs/libstdc++/manual/status.html#status.iso.2011 says "Non-conforming Copy-On-Write implementation" for 21.4 – Jonathan Wakely Apr 28 '14 at 11:27
5

libstd++'s implementation is non-conformant to C++11, but that doesn't mean your code is correctly guaranteeing the results you expect.

Doing anything to modify the values stored in the character array returned by c_str() results in undefined behavior. The standard explicitly says this:

21.4.7.1 basic_string accessors

const charT* c_str() const noexcept;
const charT* data() const noexcept;
1 Returns: A pointer p such that p + i == &operator[](i) for each i in [0,size()].
2 Complexity: constant time.
3 Requires: The program shall not alter any of the values stored in the character array.

Although above I quote C++11 this was also true of C++03.


Does anyone have an idea about how to go at it in a very big old code environments (a clockwork rule to catch this would be nice).

Hopefully you have a decent test suite. Making significant changes to large, legacy code-bases is not really practical otherwise. The easier and faster it is to run the test suite the easier and faster it will be to fix the code.

On a very large codebase auditing all uses of c_str() may be very expensive. However taking a sample and checking for what sorts of uses are made of it and what specific corrections could be applied can help you gauge the scale of the problem. In my experience you can expect a wide variety of weird things, but some will be more common.

Valgrind, debug implementations of std::string, and other tools can help identify some instances which are likely to cause real bugs. Fixing those first is the high priority. The fixes will likely involve updating APIs to be const-correct or to have well defined lifetime requirements, and switching uses of c_str() for something that produces C strings with appropriate lifetimes. Your survey of the code should have informed you as to the general variety of lifetime requirements and c-string creating utilities that will be necessary.

Other uses of c_str() can be modified incrementally over time as a lower priority, side activity.

Tools like some of those built on top of clang for refactoring or semantic search are another option for identifying problems and making large-scale changes, however it's often a big task just to get legacy code into a legal enough shape for clang tools to process it. (Here's a talk about some work Google did on this. There are also more recent talks they've done on commodity versions of this technology which Google has made available.)


I often have a hard time convincing people that 'undefined behavior' is actually a problem even in instances when no ill effects are actually observed. As you write new code remember from this experience that the lives of future maintainers will be made much easier if you conform to the C++ spec. Even if some particular instance of 'bad' code doesn't cause you problems now, that is likely to change over time as compilers and library implementations change. And even when the spec changes, the committee is careful to consider the effects on conformant legacy code. If code isn't conformant then it really doesn't get any consideration and you end up with problems like this.

Community
  • 1
  • 1
bames53
  • 86,085
  • 15
  • 179
  • 244
3

Does g++ meets std::string C++11 requirements?

No.

Before C++11 this did not pose a big problem since c_str didn't return a pointer to the actual data the string object holds, so changing it didn't matter.

This is incorrect, c_str was always allowed to return the actual data and that's exactly what it did in all popular C++03 implementations.

But after the change this combination of COW + returning the actual pointer can and breaks old applications (applications that deserve it for bad coding but nevertheless).

After what change? G++ did not change its std::string so if your old program is broken using G++ then it was always broken.

Note that even without casting the constness away, one might cause invalidation of a pointer by calling c_str, saving the pointer and then calling non-const method (which will cause write).

Your second example doesn't demonstrate any invalidation, because in a COW implementation temp is still a valid pointer while x exists. But it's possible to modify the example to invalidate temp and that's not allowed in C++11, [string.require]/6 says that in C++11 y[0] is not allowed to invalidate the pointer returned by c_str().

Jonathan Wakely
  • 166,810
  • 27
  • 341
  • 521
  • By invalidate I meant it points to a value which is not expected. I.e if you'll print it you will not get the same output as with a compliant implementation. See the comment above the cout line. But the answer is great, thanks! – buc030 Jan 29 '14 at 19:13
  • OK, I see. The standard library refers to invalidating pointers and references with a specific meaning that is different. – Jonathan Wakely Jan 29 '14 at 20:25
1

The other answers were correct at the time, but as of nowadays, accordingly to the GCC 5.x Change Log, libstdc++ as shipped by gcc 5 is now fully C++11 conformant.

  • A full description can be found here: https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html – Meixner Jul 12 '23 at 07:16