3

I have a below code snippet. I am expecting that the output will be mystring, but strangely it outputs junk characters.

#include <iostream>
#include <string>

using namespace std;

int main(int argc, char *argv[]) {
    string s1("mystring");

    const char* s2 = s1.c_str();

    s1 = "Hi there, this is a changed string s1";

    cout << s2 << endl;

    return 0;
}

(1) My initial thinking was that c_str takes care of allocating sufficient memory to hold s1 and returns address of that memory chunk which gets assigned to s2, and from here on s1 and s2 start out independently.

(2) but when I assigned s1 = "Hi there ..... " my thinking in (1) proved to be wrong. Somehow, s1 is still influencing s2.

(3) When I commented out s1 = "Hi there .... " line, everything works fine, i.e., mystring gets printed consistently.

(4) I am not really convinced about my claim in (1) that c_str is allocating memory to hold s1, because if that is the case we will have to handle freeing that memory chunk via s2 pointer which we don't do. So I am not sure about that.

Please help me explain for this strange behavior.

Onkar N Mahajan
  • 410
  • 3
  • 13
  • 2
    `s2` does not own the string. UB ensues. – E_net4 Aug 17 '18 at 10:47
  • 1
    *but strangely it outputs junk characters.* -- I see nothing strange about this. This is no different than holding onto a pointer that points to dynamically allocated memory, the memory gets freed, and then who knows what you will be pointing to if you attempt to use the freed pointer. – PaulMcKenzie Aug 17 '18 at 11:03
  • 1
    std::string will normally have to re-allocate the memory to store the string content. Not always, it does tend to have a micro-optimization for very short strings. But you have good evidence of that re-allocation being done, s2 is now a dangling pointer. You also have good evidence that you use a heap allocator that intentionally scrubs deleted heap blocks, that is common to help diagnose dangling pointer bugs. – Hans Passant Aug 17 '18 at 11:06
  • 1
    @PaulMcKenzie In my daily work, I deal everytimes with _strange_ behavior of my software until I find out that it does exactly what I wrote (but not what I intended to achieve). Hence, I wouldn't judge the term "strange" too hard... ;-) – Scheff's Cat Aug 17 '18 at 11:07
  • `const char * s2 = strndup(s1.c_str(), s1.size())`. Don't forget to `free(s2)` then. – Henri Menke Aug 17 '18 at 11:12
  • 1
    Quick mention that OP, as a new user, took the time to provide us with a [mcve], explained their observed and expected behavior, explained their thinking and asked us a specific question. I'd be happy to see more of that kind of question. – YSC Aug 17 '18 at 11:35

2 Answers2

11

Quoting cppreference:

The pointer obtained from c_str() may be invalidated by:

  • Calling non-const member functions on the string, excluding operator[], at(), front(), back(), begin(), rbegin(), end() and rend().

By modifying the string, you no longer have a valid pointer.

Also, your premise is wrong. No memory is allocated/free w.r.t. c_str. c_str and data (both identical in C++11) returns a pointer to the underlying storage. It does not own the storage. When the storage is changed for whatever reason, your pointer is no longer valid (i.e, it might point to garbage now).

user10238894
  • 111
  • 3
  • but c_str is called before the new assignment to s1, then how does old assignment to s2 gets changed ? – Onkar N Mahajan Aug 17 '18 at 10:50
  • @OnkarNMahajan `s2` is just a pointer (to the contents of `s1`). If `s1` is modified (e.g. by assignment) `s2` becomes invalid. This may mean "everything and nothing". E.g. `s2` still points to the (now modified) contents of `s1`. Or: `s2` points to memory which has been released. (It got dangling which is also called a "wild" pointer.) Assigning `s1` means you _may not_ use `s2`. It doesn't mean you _cannot_ use but doing so is considered as [Undefined Behavior](https://stackoverflow.com/a/4105123/1505939). – Scheff's Cat Aug 17 '18 at 10:58
6

s1.c_str() does not return memory that is independent of s1. Changes to s1 invalidate the pointer returned by s1.c_str().

In other words be careful of using c_str. In general use it to pass a C string to a function that doesn't accept a C++ string.

john
  • 85,011
  • 4
  • 57
  • 81
  • but c_str is called before the new assignment to s1, then how does old assignment to s2 gets changed ? – Onkar N Mahajan Aug 17 '18 at 10:48
  • @OnkarNMahajan That makes no difference. What is *probably* happening is that cstr returns a pointer to the internal memory that s1 uses. When s1 gets assigned to that memory is freed and so the pointer becaome invalid. – john Aug 17 '18 at 10:50