0

In C++ Primer 5th Edition, it says:

The Array returned by c_str is not guaranteed to be valid indefinitely.

So I did a test:

//  c_str exploration
std::string strTest = "This is a test";
const char* s1 = strTest.c_str();
strTest = "This is b test";
std::cout << s1 << std::endl;

Since s1 is a pointer, it definitely shows the new value. However when I change the value to a string of different length, it usually shows some garbage:

//  c_str exploration
std::string strTest = "This is a test";
const char* s1 = strTest.c_str();
strTest = "This is b testsssssssssssssssssssssssssss";
std::cout << s1 << std::endl;

I figured that it is because the returned C String already fixed the position of the ending null character, so when the length changes it invalidate everything. To my surprise, sometimes it is still valid even after I change the string to a new length:

//  c_str exploration
std::string strTest = "This is a test";
const char* s1 = strTest.c_str();
strTest = "This is b tests";     // Note the extra s at the end
std::cout << s1 << std::endl;

Second question:

I'm also not sure why std::cout << s1 prints the content instead of the address of the C String. While the following code prints the address of the Integer as I expected:

int dim = 42;
int* pdim = &dim;
std::cout << pdim << std::endl;

This prints out the character 'T', as expected:

std::cout << *s1 << std::endl;

My assumption is that std::cout does an auto convert, but please lecture me more on this.

Nicholas Humphrey
  • 1,220
  • 1
  • 16
  • 33
  • 4
    I disagree with the downvotes. This is a well thought out question, the OP has demonstrated effort, willingness to learn and desire to understand a topic deeper. – TypeIA Jun 08 '18 at 15:57
  • 1
    @TypeIA Part 1 is not a question. Part 2 is answerable, but a completely different topic, making the whole thing "too broad" in my opinion. – melpomene Jun 08 '18 at 16:00
  • 2
    The pointer returned by `strTest.c_str()` is invalidated by any function that modifes `strTest`. Which means using `s1` after changing `strTest` gives undefined behaviour. When behaviour is undefined, you may see results you consider "valid" or you may not. `std::cout< – Peter Jun 08 '18 at 16:02
  • 1
    `s1` after string modification is an example of a [dangling pointer](https://stackoverflow.com/questions/17997228/what-is-a-dangling-pointer) – kmdreko Jun 08 '18 at 16:05
  • 2
    @melpomene I do agree the questions should be split. I had no trouble understanding what the first section was getting at, even though no question was explicitly stated. – TypeIA Jun 08 '18 at 16:06
  • @Peter got it, thanks, so it's basically undefined behavior and that's the reason that sometimes it shows good result sometimes not. – Nicholas Humphrey Jun 08 '18 at 16:17

4 Answers4

5

First Question

The pointer returned by std::c_str() remain valid if the string is not modified. From cppreference.com:

The pointer obtained from c_str() may be invalidated by:

  • Passing a non-const reference to the string to any standard library function, or
  • Calling non-const member functions on the string, excluding operator[], at(), front(), back(), begin(), rbegin(), end() and rend().

In your posted code,

std::string strTest = "This is a test";
const char* s1 = strTest.c_str();
strTest = "This is b tests";  // This line makes the pointer invalid.

and then use of the pointer to access the string is undefined behavior.

std::cout << s1 << std::endl; // Undefined behavior.

After that, it's pointless to try to make sense of what the code does.

Second Question

The standard library provides an operator overload function between std::ostream and char const* so C-style strings can be printed in a sensible way. When you use:

std::cout << "Hello, World.";

you would want to see Hello, World. as output, not the value of the pointer that points to that string.

For reasons beyond the scope of this answer, that function overload is implemented as a non-member function.

template< class CharT, class Traits >
basic_ostream<CharT,Traits>& operator<<( basic_ostream<CharT,Traits>& os, 
                                         const CharT* s );

After all the template related tokens are substituted, that line translates to:

std::ostream& operator<<(std::ostream& os, const char* s );

You can see the list of non-member overload functions at cppreference.com.

Community
  • 1
  • 1
R Sahu
  • 204,454
  • 14
  • 159
  • 270
3

The pointer returned from c_str() is guaranteed to be valid until the string is modified. When it is modified (by calling a non-const member function), the string may have to allocate a new memory buffer internally, which invalidates the pointer. When and how this happens exactly is not specified.

For the second question: There are different overloads of operator <<, the one for string prints its content.

alain
  • 11,939
  • 2
  • 31
  • 51
  • 1
    "When and how this happens exactly is not specified." except that the iterator and reference invalidation rules are *very strong hints* – Caleth Jun 08 '18 at 16:09
  • What I meant is: It is guaranteed to *not* happen when the string is not modified, but it is very well possible that the pointer remains valid (in the sense of still pointing to allocated memory) even when it was modified. (But this is of course useless without a guarantee, and UB). – alain Jun 08 '18 at 16:15
  • Thanks so I assume it's basically "undefined". – Nicholas Humphrey Jun 08 '18 at 16:15
  • Yes, exactly, using the pointer after the string was modified is undefined behavior. – alain Jun 08 '18 at 16:16
2

First question:

c_str documentation states the following, which is a bit more clear than what the book says, as it states when it may be invalidated:

The pointer returned may be invalidated by further calls to other member functions that modify the object.

I did a quick test: when you update the string the address s1 is pointing to gets invalidated (i.e. strTest.c_str() returns a different value).

It is not really clear from the documentation which member functions invalidate the pointer, but it is probably safe to say that you should not operate on the original string variable if you are going to use the c_str pointer.

Second question:

cout infers the end of a character array from the null character. This does not work when it is an integer pointer as you have tested.

zesen
  • 107
  • 4
  • Be careful when citing cplusplus.com for questions about low-level or fine-grained details. cplusplus.com favours a low barrier for entry over precise details, and that often results in those precise details (and sometimes really important ones) being left out to make the entry understandable to readers who lack the background information required to fully understand the implications of those details. – user4581301 Jun 08 '18 at 17:14
0

Second question

std::ostream::operator<< is overloaded to take in integers, const char*'s, and several other base datatypes. There's actually a slightly different function for each of them, and anything that's not a primitive type that you print must have a defined conversion to one that is.

J. Doe
  • 305
  • 1
  • 11
  • 3
    operator << is overloaded for std::strings - no conversion is necessary –  Jun 08 '18 at 15:56
  • Thanks for the reply, guess I need to look at individual implementation to get a detailed picture. – Nicholas Humphrey Jun 08 '18 at 15:57
  • @NeilButterworth Ah, sorry about that. I'll edit. Can you think of a good example of something that is? The only ones I can think of ATM are from third party librarys – J. Doe Jun 08 '18 at 15:58