2

I created simple C++ std::string values.

But the value has unexpected results.

I tested this code with g++ compiler (Linux) and Visual Studio (Windows) and both compilers show the same problem.

Normal result code

/* This code results are Normal */

#include <bits/stdc++.h>

int main() {
    std::string a1 = "a1";
    std::string a2 = "a2";

    std::string b1("b1");
    std::string b2("b2");

    const char *c1 = std::string("c1").c_str();
    const char *c2 = std::string("c2").c_str();

    std::cout << "Expected [a1], real [" << a1 << "]\n";
    std::cout << "Expected [a2], real [" << a2 << "]\n";
    std::cout << "Expected [b1], real [" << b1 << "]\n";
    std::cout << "Expected [b2], real [" << b2 << "]\n";
    std::cout << "Expected [c1], real [" << c1 << "]\n";
    std::cout << "Expected [c2], real [" << c2 << "]\n";
}

Console result:

Expected [a1], real [a1]
Expected [a2], real [a2]
Expected [b1], real [b1]
Expected [b2], real [b2]
Expected [c1], real [c1]
Expected [c2], real [c2]

Abnormal result code

/* This code results has some problem. */

#include <bits/stdc++.h>

int main() {

    const char *c1 = std::string("c1").c_str();
    const char *c2 = std::string("c2").c_str();

    std::string a1 = "a1";
    std::string a2 = "a2";

    std::string b1("b1");
    std::string b2("b2");

    // const char *c1 = std::string("c1").c_str();
    // const char *c2 = std::string("c2").c_str();

    std::cout << "Expected [a1], real [" << a1 << "]\n";
    std::cout << "Expected [a2], real [" << a2 << "]\n";
    std::cout << "Expected [b1], real [" << b1 << "]\n";
    std::cout << "Expected [b2], real [" << b2 << "]\n";
    std::cout << "Expected [c1], real [" << c1 << "]\n"; // c1 = b2?
    std::cout << "Expected [c2], real [" << c2 << "]\n"; // b2 = b2?
}

Console result:

Expected [a1], real [a1]
Expected [a2], real [a2]
Expected [b1], real [b1]
Expected [b2], real [b2]
Expected [c1], real [b2]
Expected [c2], real [b2]

I usually use only string str = "", but I was wondering when I was testing.

The constructor is expected to have a problem I think.

How can I understand this abnormal results with std::string?

1201ProgramAlarm
  • 32,384
  • 7
  • 42
  • 56
Zem
  • 464
  • 3
  • 14
  • 1
    `const char *c1 = std::string("c1").c_str();` -- This creates a temporary, and you realize what happens to temporaries, right? – PaulMcKenzie May 13 '19 at 00:32
  • 1
    Also `#include ` -- don't do this. Include the proper headers. Also you said you tried this in Visual Studio, but this non-standard header file does not exist for Visual Studio. – PaulMcKenzie May 13 '19 at 00:38
  • 1
    You never assign a string, so there is no copy happening in this code. – stark May 13 '19 at 00:39
  • 1
    I'm continuously amazed why people provide answers in a comment box when there's a perfectly good answer box if you just scroll down a bit :-) – paxdiablo May 13 '19 at 00:41
  • @PaulMcKenzie Thank you for your advise. I don't use that header `bits/stdc++.h`. but i type that for make short header and show only codes. (erase iostream, string, etc..) – Zem May 13 '19 at 00:42
  • @paxdiablo They (including some moderators) also strenuously defend this behaviour, which is even odder! – Lightness Races in Orbit May 13 '19 at 02:59

1 Answers1

6

The problem lies here

const char *c1 = std::string("c1").c_str();
const char *c2 = std::string("c2").c_str();

What you're doing here is, in each line, you create a temporary std::string object, get a pointer to its contents, and assign that pointer to a variable. At the end of each line, the temporary std::string will be destroyed and the pointer that you got via .c_str() will, thus, become a dangling pointer

In general, both versions of your code have undefined behavior. That just means: You wrote code that violates the basic assumptions the compiler is allowed to work on (such as that, if you decide to dereference a pointer, that pointer is actually pointing to a valid object) and, thus, the compiler cannot be expected to somehow magically produce a program that behaves in a meaningful way. The exact explanation for why the first version "works" (i.e., appears to work) while the second version doesn't has to do with what machine code the compiler happened to translate your C++ code into. In general, note that, in the second version, your two temporary strings are the first strings to be constructed. Once these temporary strings are destroyed, whatever memory may have been allocated for them can be reused for the strings created afterwards. In your first example, on the other hand, your temporary strings are the last strings to be constructed. After the temporary strings are destroyed, no other local objects are going to be constructed for which memory would be needed. Thus, it's not unlikely that the contents of the memory which your two pointers will point to just doesn't get overwritten. So while your pointers won't be valid pointers anymore and accessing the objects they pointed to would not be allowed (because the objects don't exist anymore), just doing it anyways will likely still produce the expected result.

Since you didn't state the exact compiler version and compilation options used, it's hard to say exactly what your compiler was doing. But let's take a look at what the latest GCC will do with optimization level -O2 (I was unable to reproduce the issue with just default settings). The standard library used by current GCC versions by default performs short string optimization. Each of the strings in question is just two characters long. Thus, the internal buffer of the std::string objects and result of .c_str() will actually be located inside the std::string object on the stack. Looking at the assembly for the first and second version of your code above, we see that the compiler does indeed put the temporary strings into two separate places on the stack in the first version, while it places them in the same spot in which it later constructs the string b2 in the second version…

Michael Kenzel
  • 15,508
  • 2
  • 30
  • 39
  • It happens to work right in the first case because there's no more strings created or destroyed after you assign `c1` and `c2`, so the data is fortuitously still valid. (Although technically, the `c2` assignment line might clobber the `c1` temporary, so your last two lines in the first situation might be `c2\nc2`). – Dave M. May 13 '19 at 00:39