45

We recently had a lecture in college where our professor told us about different things to be careful about when programming in different languages. The following is an example in C++:

std::string myFunction()
{
    return "it's me!!";
}

int main(int argc, const char * argv[])
{
    const char* tempString = myFunction().c_str();

    char myNewString[100] = "Who is it?? - ";
    strcat(myNewString, tempString);
    printf("The string: %s", myNewString);

    return 0;
}

The idea why this would fail is that return "it's me!!" implicitly calls the std::string constructor with a char[]. This string gets returned from the function and the function c_str() returns a pointer to the data from the std::string.

As the string returned from the function is not referenced anywhere, it should be deallocated immediately. That was the theory.

However, letting this code run works without problems. Would be curious to hear what you think. Thanks!

Barmar
  • 741,623
  • 53
  • 500
  • 612
guitarflow
  • 2,930
  • 25
  • 38
  • 23
    It doesn't "work without problems", it only **pretends to work.** It's undefined behavior, so anything may happen. –  Jan 02 '14 at 12:39
  • 4
    The memory is allocated differently, but this answer still applies: http://stackoverflow.com/a/6445794/13005 – Steve Jessop Jan 02 '14 at 12:47
  • @SteveJessop +1, I was just searching for the same link to post it here. – Angew is no longer proud of SO Jan 02 '14 at 12:50
  • 1
    @MarounMaroun It's a long story :D It refers to carbonic acid in carbonated water, and that has to do with syphons. (The first time I've used this name is Hungary's most popular iPhone blog, Szifon, of which the pronunciation resembles that of "iPhone", so this is a kind of pun.) –  Jan 02 '14 at 12:51
  • 4
    For bonus points, ask your professor why not write `strcat(myNewString, yFunction().c_str());` instead. (Hint: the temporary object lives until the end of the full expression, so alhough this _kind of_ looks the same, is 100% well-defined). – Damon Jan 02 '14 at 14:37

9 Answers9

47

Your analysis is correct. What you have is undefined behaviour. This means pretty much anything can happen. It seems in your case the memory used for the string, although de-allocated, still holds the original contents when you access it. This often happens because the OS does not clear out de-allocated memory. It just marks it as available for future use. This is not something the C++ language has to deal with: it is really an OS implementation detail. As far as C++ is concerned, the catch-all "undefined behaviour" applies.

juanchopanza
  • 223,364
  • 34
  • 402
  • 480
  • 1
    Note that in this case, the contents of the string is likely on the stack, because it's short. So you shouldn't get segfault, garbage at most. – user2345215 Jan 02 '14 at 12:46
  • 12
    `This often happens because the OS does not clear out de-allocated memory.` not true. It's not the OS but usually malloc implementation, which can be 3rd party like jemalloc or even a custom one. – Zaffy Jan 02 '14 at 12:58
5

I guess deallocation does not imply memory clean-up or zeroing. And obviously this could lead to a segfault in other circumstances.

jbgs
  • 2,795
  • 2
  • 21
  • 28
3

As others have mentioned, according to the C++ standard this is undefined behavior.

The reason why this "works" is because the memory has been given back to the heap manager which holds on to it for later reuse. The memory has not been given back to the OS and thus still belongs to the process. That's why accessing freed memory does not cause a segmentation fault. The problem remains however that now two parts of your program (your code and the heap manager or new owner) are accessing memory that they think uniquely belongs to them. This will destroy things sooner or later.

Toby Speight
  • 27,591
  • 48
  • 66
  • 103
StackedCrooked
  • 34,653
  • 44
  • 154
  • 278
3

I think that the reason is that the stack memory has not been rewriten, so it can get the original data. I created a test function and called it before the strcat.

std::string myFunction()
{
    return "it's me!!";
}


void test()
{
    std::string str = "this is my class";
    std::string hi = "hahahahahaha";

    return;
}

int main(int argc, const char * argv[])
{
    const char* tempString = myFunction().c_str();


    test();
    char myNewString[100] = "Who is it?? - ";
    strcat(myNewString, tempString);
    printf("The string: %s\n", myNewString);

    return 0;
}

And get the result:

The string: Who is it?? - hahahahahaha

This proved my idea.

1

The fact that the string is deallocated does not necessarily mean that the memory is no longer accessible. As long as you do nothing that could overwrite it, the memory is still usable.

marcolz
  • 2,880
  • 2
  • 23
  • 28
  • string's destructor free's its pointer and `tempString` still points to it; does dereferencing tempString leads to UB. – Zaffy Jan 02 '14 at 13:01
  • As long as you... and std::string's destructor, the heap manager, OS memory manager, or another thread do nothing... – Scott Jan 08 '14 at 18:54
1

As said above - it's unpredicted behaviour. It doesn't work for me (in Debug configuration). The std::string Destructor is called immediately after the assignment to the tempString - when the expression using the temporary string object finishes. Leaving the tempString to point on a released memory (that in your case still contains the "it's me!!" literals).

VitaliG
  • 52
  • 3
1

You cannot conclude there is no problems by getting your result by coincidence.

There are other means to detect 'problems' :

  • Static analysis.
  • Valgrind would catch the error, showing you both the offending action (trying to copy from freed zone -by strcat) and the deallocation which caused the freeing.

Invalid read of size 1

   at 0x40265BD: strcat (mc_replace_strmem.c:262)
   by 0x80A5BDB: main() (valgrind_sample_for_so.cpp:20)
   [...]
Address 0x5be236d is 13 bytes inside a block of size 55 free'd
   at 0x4024B46: operator delete(void*) (vg_replace_malloc.c:480)
   by 0x563E6BC: std::string::_Rep::_M_destroy(std::allocator<char> const&) (in /usr/lib/libstdc++.so.6.0.13)
   by 0x80A5C18: main() (basic_string.h:236)
   [...]

  • The one true way would be to prove the program correct. But it is really hard for procedural language, and C++ makes it harder.
YvesgereY
  • 3,778
  • 1
  • 20
  • 19
-1

Actually, string literals have static storage duration. They are packed inside the executable itself. They are not on the stack, nor dynamically allocated. In the usual case, it is correct that this would be pointing to invalid memory and be undefined behavior, however for strings, the memory is in static storage, so it will always be valid.

divinas
  • 1,787
  • 12
  • 12
  • The point is that `myFunction()` returns an `std::string`. So `const char* tempString = myFunction().c_str();` points to a temporary `std::string`'s internal data. Once the temporary dies, the pointer is "dangling". – juanchopanza Jan 02 '14 at 13:07
  • Ok, fair enough, I missed that only a temporary is used to get the buffer. I'm still fairly certain that if you use a string literal for the initialization of the string, the `c_str()` pointer will point to the static char array, however that is implementation dependent and one shouldn't rely on such behavior. – divinas Jan 02 '14 at 13:13
  • Not really. An `std::string` holds its own copy of whatever string was used to initialize it. Of course, the compiler only needs to follow the "as-if" rule, so what you describe *could* potentially be possible as an optimization. – juanchopanza Jan 02 '14 at 13:22
  • 1
    There's no way for `std::string` to recognize pointers to string literals and act accordingly AFAIK, not even with compiler extensions. And even if it could, it would add another special case (and hence branches and code) for not trying to free the storage at the end, which might make performance *worse* in total. –  Jan 02 '14 at 13:23
  • Actually, there's probably already a branch that tries not to delete the string, but that branch is used for the Small-String Optimization. And that's not the only similarity: both types of strings can be memcopied. So one bit should be enough, which probably can be robbed from the top of the length field. (And that's quickly tested as a sign bi) – MSalters Jan 02 '14 at 16:01
-1

Unless I'm missing something, I think this is an issue of scope. myFunction() returns a std::string. The string object is not directly assigned to a variable. But it remains in scope until the end of main(). So, tempString will point to perfectly valid and available space in memory until the end of the main() code block, at which time tempString will also fall out of scope.

Tim Rolen
  • 59
  • 3
  • 1
    I don't think this is correct. The std::string lives until the semicolon. Then it is freed. The assigned pointer from `c_str` points to an address of an already freed object. The memory is not yet overwritten, that's why it seems to work (as many others already wrote). – guitarflow Jan 08 '14 at 21:17