0

I tried to convert a hex-encoded string into a const char* buffer containing the decoded string. For that I have a function (string hex_decode(string hex_input)) that takes the string and decodes it.
After testing the function by itself a couple of times, I tried the following call:

const char* hex_decoded_c_str = hex_decode(input).c_str();

and saw that it returned complete gibberish! In an effort to debug the problem, I broke it up into two statements, like so:

string hex_decoded_string = hex_decode(input);
const char* hex_decoded_c_str = hex_decoded_string.c_str();

This time it worked perfectly!
I am so confused, does anyone know what could cause something like this?

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
TheCaptain
  • 25
  • 4
  • @Biffen not anymore. – Jean-François Fabre Mar 03 '17 at 13:38
  • I'm sure, you create a local string object in hex_decode. Since string class has a deep copy assignment operator, when you return the string, it is copied into hex_decoded_string object without memory problem. In the former case, you are pointing hex_decoded_c_str to stack memory which is not correct case – avatli Mar 03 '17 at 13:39

4 Answers4

1

string hex_decode(string hex_input) returns a temporary string object. c_str() returns a pointer into the private parts of that string object. Once the temporary dies (at the end of the full expression), that pointer points into inaccessible memory. That's called undefined behavior.

To solve this, copy the return value into a local string object, or extend the lifetime of the temporary, by binding it to a const string& reference. The validity of the return value of c_str() is tied to the lifetime of the object it was called on. Abidance cannot be enforced by the compiler, library or runtime. This is something a developer needs to make sure.

For reference, see Lifetime.

IInspectable
  • 46,945
  • 8
  • 85
  • 181
1

Prototype of string hex_decode(string hex_input) says a new string object is always returned by hex_decode.

Paragraph 12.2#3 of the standard says:

Temporary objects are destroyed as the last step in evaluating the full-expression that (lexically) contains the point where they were created.

Additionally c_str() returns a pointer to something inside of the string object.

When reading documentation for c_str() and data() (which now perform the same function) you can deduce that from this line (see here or 21.4.7.1#1):

c_str() + i == &operator[](i) for every i in [0, size()].

Therefore:

Once c_str() ended there's ; and the temporary string returned by hex_decode() must be destroyed. During destruction it frees underlying character array.

However to preserve character array there's no need to copy the temporary string. 12.2#4,5:

There are two contexts in which temporaries are destroyed at a different point than the end of the full-expression. ... The second context is when a reference is bound to a temporary.

So you can just bind a const reference to the temporary object and save on the copy constructor call:

const string & hex_decoded_string = hex_decode(input);
const char* hex_decoded_c_str = hex_decoded_string.c_str();
fukanchik
  • 2,811
  • 24
  • 29
0

with

const char* hex_decoded_c_str = hex_decode(input).c_str();

you're taking a reference to the internal representation of a string temporary object (the return value), which is invalid when hex_decode(input) goes out of scope, immediately after the c_str() call.

With the other approach, you copy the string object, making that reference valid in your scope.

Jean-François Fabre
  • 137,073
  • 23
  • 153
  • 219
  • *"a [...] temporary object [...], which is invalid in the next line."* - This is inaccurate. The temporary is destroyed as the last step in evaluating the full-expression that contains the point of creation. If there is code on the same line following that full-expression, the temporary is dead **on the same line** already. It's best not to employ the concept of *"lines"* that's mostly meaningless to a C++ compiler. – IInspectable Mar 03 '17 at 13:56
  • *"Instruction"* is not a term used in C++. It's commonly attributed to assembly programming. Still being inaccurate is hardly an improvement over being inaccurate. – IInspectable Mar 03 '17 at 14:03
  • okay, is the current wording better for you? – Jean-François Fabre Mar 03 '17 at 14:41
  • `hex_decode(input)` is a function invocation. It never *"goes out of scope"*. And if it did, that certainly wouldn't happen *"immediately after the `c_str()` call"* either. At the very least, there's a pointer assignment in between. No, no improvement so far. There are two answers by now that are way more accurate. Future visitors will have to decide, whether this is good enough. – IInspectable Mar 03 '17 at 15:09
  • You win, you have worded the explanation of this better than me. – Jean-François Fabre Mar 03 '17 at 15:16
-1

Pointer returned by c_str() method becomes invalid after std::string instance is destroyed.

Kamil Koczurek
  • 696
  • 5
  • 15