26

I have the following code:

std::string getString() {
    std::string str("hello");
    return str;
}

int main() {
    const char* cStr = getString().c_str();
    std::cout << cStr << std::endl; // this prints garbage
}

What I thought would happen is that getString() would return a copy of str (getString() returns by value); thus, the copy of str would stay "alive" in main() until main() returns. This would make cStr point to a valid memory location: the underlying char[] or char* (or whatever) of the copy of str returned by getString() which, remains in main().

However, this is obviously not the case, as the program outputs garbage. So, the question is, when is str destroyed, and why?

songyuanyao
  • 169,198
  • 16
  • 310
  • 405
hacksoi
  • 1,301
  • 13
  • 23

3 Answers3

36

getString() would return a copy of str (getString() returns by value);

It's right.

thus, the copy of str would stay "alive" in main() until main() returns.

No, the returned copy is a temporary std::string, which will be destroyed at the end of the statement in which it was created, i.e. before std::cout << cStr << std::endl;. Then cStr becomes dangled, dereference on it leads to UB, anything is possible.

You can copy the returned temporary to a named variable, or bind it to a const lvalue-reference or rvalue-reference (the lifetime of the temporary will be extended until the reference goes out of scope). Such as:

std::string s1 = getString();    // s1 will be copy initialized from the temporary
const char* cStr1 = s1.c_str();
std::cout << cStr1 << std::endl; // safe

const std::string& s2 = getString(); // lifetime of temporary will be extended when bound to a const lvalue-reference
const char* cStr2 = s2.c_str();
std::cout << cStr2 << std::endl; // safe

std::string&& s3 = getString();  // similar with above
const char* cStr3 = s3.c_str();
std::cout << cStr3 << std::endl; // safe

Or use the pointer before the temporary gets destroyed. e.g.

std::cout << getString().c_str() << std::endl;  // temporary gets destroyed after the full expression

Here is an explanation from [The.C++.Programming.Language.Special.Edition] 10.4.10 Temporary Objects [class.temp]]:

Unless bound to a reference or used to initialize a named object, a temporary object is destroyed at the end of the full expression in which it was created. A full expression is an expression that is not a subexpression of some other expression.

The standard string class has a member function c_str() that returns a C-style, zero-terminated array of characters (§3.5.1, §20.4.1). Also, the operator + is defined to mean string concatenation. These are very useful facilities for strings . However, in combination they can cause obscure problems. For example:

void f(string& s1, string& s2, string& s3)
{

    const char* cs = (s1 + s2).c_str();
    cout << cs ;
    if (strlen(cs=(s2+s3).c_str())<8 && cs[0]==´a´) {
        // cs used here
    }

}

Probably, your first reaction is "but don’t do that," and I agree. However, such code does get written, so it is worth knowing how it is interpreted.

A temporary object of class string is created to hold s1 + s2 . Next, a pointer to a C-style string is extracted from that object. Then – at the end of the expression – the temporary object is deleted. Now, where was the C-style string allocated? Probably as part of the temporary object holding s1 + s2 , and that storage is not guaranteed to exist after that temporary is destroyed. Consequently, cs points to deallocated storage. The output operation cout << cs might work as expected, but that would be sheer luck. A compiler can detect and warn against many variants of this problem.

songyuanyao
  • 169,198
  • 16
  • 310
  • 405
  • But saving the temporary std::string in a variable like: std::string str = getString(); won't destroy the copy? why not? is it because the = uses the copy constructor? – hacksoi Mar 14 '16 at 06:21
  • I am sorry but I am failing to understand this. I agree that str will be destroyed but then I have already stored the value of string.c_str in another variable. It should print hello. – Priyansh Goel Mar 14 '16 at 06:23
  • @FakeJake Yes, the named variable `str` will be copy initialized from the temporary variable, (even though the copy will be omitted usually by RVO), then `str` will be valid until `main()` ends, so it'll be safe to use `str.c_str()` in `main()`. – songyuanyao Mar 14 '16 at 06:25
  • 1
    @PriyanshGoel `std::string::c_str()` will return a pointer, yes we could save the pointer to another variable, but the memory the pointer pointing to, it's allocated by `std::string`, will be destroyed either when the `std::string` destroyed. Then the pointer saved in another variable will be a dangled pointer. – songyuanyao Mar 14 '16 at 06:28
  • @songyuanyao, Can you please make your answer more detailed. I mean add the things you have mentioned in the comments to the answer itself. – Akshay Arora Mar 14 '16 at 06:37
  • @songyuanyao : Yes that seems correct to me. But with that logic it should always display garbage, right? But here it works for me http://ideone.com/CgxK6d – Priyansh Goel Mar 14 '16 at 06:39
  • @AkshayArora Thanks for the suggestion. – songyuanyao Mar 14 '16 at 06:54
  • 1
    @PriyanshGoel I've added some more explanation in my answer. Even it seems to work well, dereferencing on deallocated memory *is* UB, anything is possible. – songyuanyao Mar 14 '16 at 06:57
  • @songyuanyao : Thanks for the awesome explanation . It helped me a lot to get the concept right! – Priyansh Goel Mar 14 '16 at 07:02
  • What about adding another one to your 3 suggestion of a correct code? Namely, a one-liner: `std::cout << getString().c_str() << std::endl;` (correct since it won't be destroyed "the end of the full expression"). – imz -- Ivan Zakharyaschev Apr 28 '21 at 17:58
6

Problem here is that you are returning a temporary variable and over that temporary variable you are doing c_str function.

"c_str() function Returns a pointer to an array that contains a null-terminated sequence of characters (i.e., a C-string) representing the current value of the string object( [http://www.cplusplus.com/reference/string/string/c_str/][1]).

In this case your pointer is pointing to memory location which is now not present.

std::string getString() {
        std::string str("hello");
        return str; // Will create Temporary object as it's return by value}

    int main() {
         const char* cStr = getString().c_str(); // Temporary object is destroyed
        std::cout << cStr << std::endl; // this prints garbage }

Solution is to copy your temporary object to memory location properly(by creating local copy) and then use c_str over that object.

user258367
  • 3,247
  • 2
  • 18
  • 17
4

As mentioned by others you are using a pointer to temporary after it has already been deleted - this is a classic example of heap after free use.

What I can add to others' answers is that you can easily detect such usage with gcc's or clang's address sanitizers.

Example:

#include <string>
#include <iostream>

std::string get()
{
  return "hello";
}

int main()
{
  const char* c = get().c_str();
  std::cout << c << std::endl;
}

sanitizer output:

=================================================================
==2951==ERROR: AddressSanitizer: heap-use-after-free on address 0x60300000eff8 at pc 0x7f78e27869bb bp 0x7fffc483e670 sp 0x7fffc483de20
READ of size 6 at 0x60300000eff8 thread T0
    #0 0x7f78e27869ba in strlen (/usr/lib64/libasan.so.2+0x6d9ba)
    #1 0x39b4892ba0 in std::basic_ostream<char, std::char_traits<char> >& std::operator<< <std::char_traits<char> >(std::basic_ostream<char, std::char_traits<char> >&, char const*) (/usr/lib64/libstdc++.so.6+0x39b4892ba0)
    #2 0x400dd8 in main /tmp/tmep_string/main.cpp:12
    #3 0x39aa41ed5c in __libc_start_main (/lib64/libc.so.6+0x39aa41ed5c)
    #4 0x400c48  (/tmp/tmep_string/a.out+0x400c48)

0x60300000eff8 is located 24 bytes inside of 30-byte region [0x60300000efe0,0x60300000effe)
freed by thread T0 here:
    #0 0x7f78e27ae6ea in operator delete(void*) (/usr/lib64/libasan.so.2+0x956ea)
    #1 0x39b489d4c8 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::~basic_string() (/usr/lib64/libstdc++.so.6+0x39b489d4c8)
    #2 0x39aa41ed5c in __libc_start_main (/lib64/libc.so.6+0x39aa41ed5c)

previously allocated by thread T0 here:
    #0 0x7f78e27ae1aa in operator new(unsigned long) (/usr/lib64/libasan.so.2+0x951aa)
    #1 0x39b489c3c8 in std::string::_Rep::_S_create(unsigned long, unsigned long, std::allocator<char> const&) (/usr/lib64/libstdc++.so.6+0x39b489c3c8)
    #2 0x400c1f  (/tmp/tmep_string/a.out+0x400c1f)

SUMMARY: AddressSanitizer: heap-use-after-free ??:0 strlen
Shadow bytes around the buggy address:
  0x0c067fff9da0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9db0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9dc0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9dd0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9de0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
=>0x0c067fff9df0: fa fa fa fa fa fa fa fa fa fa fa fa fd fd fd[fd]
  0x0c067fff9e00: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9e10: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9e20: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9e30: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c067fff9e40: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Heap right redzone:      fb
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack partial redzone:   f4
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
==2951==ABORTING
Patryk
  • 22,602
  • 44
  • 128
  • 244
  • I guess you mean "heap use after free" ; there is not necessarily any heap involved here (string may use SSO to avoid needing to allocate a buffer) – M.M Apr 29 '21 at 01:31