14

std::string::c_str() returns a pointer to an array that contains a null-terminated sequence of characters (i.e., a C-string) representing the current value of the string object.

In C++98 it was required that "a program shall not alter any of the characters in this sequence". This was encouraged by returning a const char* .

IN C++11, the "pointer returned points to the internal array currently used by the string object to store the characters that conform its value", and I believe the requirement not to modify its contents has been dropped. Is this true?

Is this code OK in C++11?

#include<iostream>
#include<string>
#include<vector>
using namespace std;

std::vector<char> buf;

void some_func(char* s)
{
    s[0] = 'X'; //function modifies s[0]
    cout<<s<<endl;
}

int main()
{
    string myStr = "hello";
    buf.assign(myStr.begin(),myStr.end());
    buf.push_back('\0');
    char* d = buf.data();   //C++11
    //char* d = (&buf[0]);  //Above line for C++98
    some_func(d);   //OK in C++98
    some_func(const_cast<char*>(myStr.c_str())); //OK in C++11 ?
    //some_func(myStr.c_str());  //Does not compile in C++98 or C++11
    cout << myStr << endl;  //myStr has been modified
    return 0;
}
Rapptz
  • 20,807
  • 5
  • 72
  • 86
user2662157
  • 145
  • 5
  • 2
    `c_str()` still is `const char*` so fortunately immutable, corresponding to a cacheable result. – Joop Eggen Aug 07 '13 at 21:00
  • 9
    Why do you need this anyway, what's wrong with `&myStr.front()`? – Jonathan Wakely Aug 07 '13 at 21:05
  • 2
    `&myStr[0]` works too – Praetorian Aug 07 '13 at 21:15
  • It's fairly easy to break class coherence doing this, which is why it's not permitted. – Joel Aug 07 '13 at 22:28
  • @Praetorian However, is it defined that the internal contents of the string must be in contiguous memory? While modifying a single character in the string would be allowed by doing `myStr[0] = 'X'`, what would happen if one tried to `strcpy(&myStr[0], 'abc', 3)`? Just thinking that the reason to deal with a `char *` instead of a `char &` is to try to deal with it as a C-string. – Andre Kostur Aug 08 '13 at 01:20
  • 1
    @AndreKostur Yep, C++11 mandates that the string be stored contiguously in memory. So modifying a range of characters via a pointer to the first is OK, as long as you [don't modify the terminating NULL character](http://stackoverflow.com/questions/12740403/legal-to-overwrite-stdstrings-null-terminator). – Praetorian Aug 08 '13 at 01:44
  • @Praetorian The terminating `'\0'` is not part of the sequence represented by the `string`. It is part of the C string which you cannot modify anyway. Nothing says that `s.end()` points to a `'\0'`, and it's fairly likely that calling `c_str()` actually assigns that. – Potatoswatter Aug 08 '13 at 01:52
  • 2
    … although, reviewing that Q&A, and the Standard, there is actually no way to obtain a pointer to a modifiable range; in effect the characters are `const` even for a non-const pointer. The contiguity only guarantees that you can *read* the string as an array. But you cannot assume the terminator is there except after `c_str` is called and before any non-const member function is called. (Edit: Ah, this is fixed in C++14 so you can modify anything except the terminator, which is generated and returned by `operator[]` for any index not less than `size()`, i.e. it returns a fake reference.) – Potatoswatter Aug 08 '13 at 02:05
  • @Potatoswatter I didn't say `end()` points to a `\0`. Also `c_str()` and `data()` are defined in terms of `operator[]` and must be O(1), this means that both of those calls must add the value initialized `CharT()` if one isn't already present, *and* the string must already have enough room allocated for the terminator during calls to any of the 3, even if it hasn't been initialized. – Praetorian Aug 08 '13 at 02:18
  • @Praetorian Okay, your warning not to modify the `'\0'` is contingent on its existence. Nevertheless, it's not part of the sequence, and deducing that its storage must exist due to other requirements is pretty bad practice. – Potatoswatter Aug 08 '13 at 02:34
  • "But you cannot assume the terminator is there except after c_str is called and before any non-const member function is called." I am learning a great deal here. So let's see if I understand. Imagine I have a legacy C function that takes a char* as a parameter, and does not modify the string, but does look for the '\0' (perhaps to learn its length). If I pass &myStr[0] there may or may not BE a terminal '\0', whereas if I pass the undesirable const_cast(myStr.c_str()) there will be, and everything will work as desired. Am I making a mistake here? – user2662157 Aug 08 '13 at 05:18

4 Answers4

26

3 Requires: The program shall not alter any of the values stored in the character array.

That requirement is still present as of draft n3337 (The working draft most similar to the published C++11 standard is N3337)

Borgleader
  • 15,826
  • 5
  • 46
  • 62
5

In C++11, yes the restriction for c_str() is still in effect. (Note that the return type is const, so no particular restriction is actually required for this function. The const_cast in your program is a big red flag.)

But as for operator[], it appears to be effect only due to an editorial error. Due to a punctuation change slated for C++14, you may modify it. So the interpretation is sort of up to you. Of course doing this is so common that no library implementation would dare break it.

C++11 phrasing:

Returns: *(begin() + pos) if pos < size(), otherwise a reference to an object of type T with value charT(); the referenced value shall not be modified.

C++14 phrasing:

Returns: *(begin() + pos) if pos < size(). Otherwise, returns a reference to an object of type charT with value charT(), where modifying the object leads to undefined behavior.

You can pass c_str() as a read-only reference to a function expecting a C string, exactly as its signature suggests. A function expecting a read-write reference generally expects a given buffer size, and to be able to resize the string by writing a NUL within that buffer, which std::string implementations don't in fact support. If you want to do that, you need to resize the string to include your own NUL terminator, then pass & s[0] which is a read-write reference, then resize it again to remove your NUL terminator and hand the responsibility of termination back to the library.

Potatoswatter
  • 134,909
  • 25
  • 265
  • 421
  • +1 I always assumed that the *the referenced value shall not be modified* part only applied to the *otherwise* half, mainly because there wouldn't be a need for const and non-const overloads for `operator[]` if it covered the whole thing, but I see the ambiguity now. – Praetorian Aug 08 '13 at 02:28
  • @Praetorian It's worse because C++03 was specified with `operator[]()` routing through `data()`, suggesting a built-in unsafe `const_cast`. If C++11 had added an accidental restriction by editorial error, it would be clearer, but instead the error preserved a defective specification. – Potatoswatter Aug 08 '13 at 02:40
3

I'd say that if c_str() returns a const char * then its not ok, even if it can be argued to be a gray area by a language lawyer.

The way I see it is simple. The signature of the method states that the pointer it returns should not be used to modify anything.

In addition, as other commenters have pointed out, there are other ways to do the same thing that do not violate any contracts. So it's definitely not ok to do so.

That said, Borgleader has found that the language still says it isn't.

Carl
  • 43,122
  • 10
  • 80
  • 104
-2

I have verified that this is in the published C++11 standard

Thank you

what's wrong with &myStr.front()?

string myStr = "hello";
char* p1 = const_cast<char*>(myStr.c_str());
char* p2 = &myStr.front();
p1[0] = 'Y';
p2[1] = 'Z';

It seems that pointers p1 and p2 are exactly the same. Since "The program shall not alter any of the values stored in the character array", it would seem that the last two lines above are both illegal, and possibly dangerous.

At this point, the way I would answer my own question is that it is safest to copy the original std::string into a vector and then pass a pointer to the new array to any function that might possibly change the characters.

I was hoping that that this step might no longer be necessary in C++11, for the reasons I gave in my original post.

user2662157
  • 145
  • 5
  • -1: You did a `const_cast`. Why isn't that enough of a clue that it's a *bad idea*? Just because the "pointers p1 and p2 are exactly the same" doesn't mean that you should assume that they *always* will be. Just use `front` or `&[0]`, and stop lying to your compiler. Your coding style is horrible; just do it the right way please. – Nicol Bolas Aug 08 '13 at 01:45
  • 2
    There's no need for the `const_cast`, those should be avoided as much as possible; use `front()` or `operator[]` to get a reference to the first element. Also, there's no need for copying the string into a vector to modify it as long as you make sure the string is large enough to be written to, and you don't [modify the terminating NULL character](http://stackoverflow.com/questions/12740403/legal-to-overwrite-stdstrings-null-terminator) – Praetorian Aug 08 '13 at 01:53
  • I did read the link regarding the NULL character, but the standard is that "The program shall not alter any of the values stored in the character array." There is no guarantee that doing so will be OK in all circumstances. – user2662157 Aug 08 '13 at 02:03
  • Initializing p2 does not involve a const cast, but it could be argued that it accomplishes precisely the same thing in a "horrible" nontransparent way. – user2662157 Aug 08 '13 at 02:07
  • 3
    @user2662157: "*the standard is that "The program shall not alter any of the values stored in the character array."*" It says that for the array *returned by `c_str`*, not for the `std::string` in general. Context is important. – Nicol Bolas Aug 08 '13 at 02:18
  • @user2662157: "*it could be argued that it accomplishes precisely the same thing in a "horrible" nontransparent way.*" That would be a terrible and false argument, since the standard allows one of them while expressly forbidding the other. – Nicol Bolas Aug 08 '13 at 02:18
  • > "the array returned by c_str" c_str returns a pointer to (the first character of) an array, which in C++11 is "the internal array currently used by the string object to store the characters that conform its value." http://www.cplusplus.com/reference/string/string/c_str/ You are quite right to avoid const_cast, but the question at issue is whether modifying the contents of this array using this pointer (or an identical pointer) is safe given that "The program shall not alter any of the values stored in the character array." – user2662157 Aug 08 '13 at 03:06
  • 1
    @user2662157: "*or an identical pointer*" No, that's not how the standard works. The standard doesn't care if a pointer *just so happens* to be identical in value to another. The standard says what it says. You are forbidden from modifying the string via the pointer returned by `c_str`. Another function may return a pointer that you *are* allowed to modify. The fact that these two pointers may have (or even are required to have) the same pointer value is *completely irrelevant* to what the standard says. You can modify through one of them, and you can't modify through the other. – Nicol Bolas Aug 08 '13 at 10:42
  • It may be that the "standard doesn't care" if two pointers "just happen" to be identical in value, but it does matter if they "are required to have" the same value. – user2662157 Aug 08 '13 at 16:15
  • Using const_cast is not morally wrong. It is just plain dangerous unless you know exactly what any compiler may do as a result of your use of the resulting pointer. Which is seldom the case. Using code that is guaranteed to return the same pointer will expose you to exactly the same risk for the same reason. – user2662157 Aug 08 '13 at 16:36
  • 1
    I apologize. My last two comments are wrong. In this case, c_str may modify the internal array (in particular, the terminal '\0") before returning the pointer. So it is quite correct to say that it is completely irrelevant that another pointer would have the same value. – user2662157 Aug 08 '13 at 17:03