0

In the following code, a string is encapsulated within a class, Foo.

The call to Foo::getText() returns the string by value. This creates a second instance of the string object, but both string objects now point to the same char buffer on the heap.

When the Foo instance is deleted, the the encapsulated string is automatically destructed, and therefore the char buffer on the heap is deleted.

Even though getText() returns a string by value, the resulting string object still relies on the lifetime of the original string object in order to retain the char buffer to which they mutually point.

Doesn't this mean that the printing of the retval to the terminal is an invalid access to memory that has already been free'd on the heap?

class Foo
{
    Foo(const char* text) :
       str(text)
    {
    }

    std::string getText()
    {
        return str;
    }

    std::string str;
};

int main()
{
    Foo pFoo = new Foo("text");
    std::string retval = foo.getText();
    delete pFoo;
    cout << retval;  // invalid memory access to char buffer?
}

I think a lot of people assume that, because the string was returned by value, they need not be concerned about the lifetime of the original string within Foo. This problem isn't strictly related to strings, but really applies to any class with encapsulated pointers that are free'd upon destruction. But what's the best practice here when it comes to strings?

  1. Never return string by value?
  2. Only return strings by value if the lifetime of the original string is guaranteed?
  3. Always make a copy of the string? return std::string(retval.c_str());
  4. Enforce a contract with the caller of getText()?

EDIT:

I think I was misled by RVO. All three strings in this example return a c_str at the same address. Is RVO to blame?

class Obj
{
public:
    Obj() : s("text")
    {
        std::printf("%p\n", s.c_str());
    }

    std::string getText() { return s; }

    std::string s;
};

int main()
{
    Obj* pObj = new Obj();
    std::string s1(pObj->getText());
    std::string s2 = pObj->getText();
    delete pObj;
    std::printf("%p\n", s1.c_str());
    std::printf("%p\n", s2.c_str());
}

Result:

0x600022888
0x600022888
0x600022888
Jason
  • 192
  • 4
  • 12
  • 1
    @πάντα ῥεῖ This is about `std::string`, not `std::vector`. Another dupehammer miss. – Lightness Races in Orbit Jun 23 '16 at 21:07
  • 5
    *Even though getText() returns a string by value, the resulting string object still relies on the lifetime of the original string object in order to retain the char buffer to which they mutually point.* -- What?? Where did you get this from, or are you guessing? C++ is not Java, if you're using Java as a reference in figuring out how C++ works. – PaulMcKenzie Jun 23 '16 at 21:10
  • 1
    *but both string objects now point to the same char buffer on the heap.* -- If the class is coded like a lot of newbie attempts at home-made string classes with erroneous or missing copy constructor, assignment operator, and destructor, then yes, it can point to the same buffer **erroneously**. However we're talking about "std::string", written by professionals and experts, thus has the correct copy semantics. – PaulMcKenzie Jun 23 '16 at 21:29
  • @PaulMcKenzie I did experimentation above to make that conclusion. Why would anyone ever write their own string class? – Jason Jun 23 '16 at 21:31
  • @Jason You wouldn't believe how many posts we get here from "homework doers" who want to write their own string class. And yes, if you were to look at their attempts at this, you would come to the conclusion that the buffer is shared. – PaulMcKenzie Jun 23 '16 at 21:34
  • @PaulMcKenzie Got it. Can you explain why each c_str() call returns a pointer to the same array? – Jason Jun 23 '16 at 21:36
  • I wouldn't be surprised if the same address comes from the compiler optimizing and simply loading the address of a string from a label for "text". – chris Jun 23 '16 at 21:37
  • @chris I set the -O0 flag but still got the same answer. Perhaps it's unavoidable. – Jason Jun 23 '16 at 21:37
  • @Jason If you now changed the contents of s1 to be different than s2, you would more than likely see a difference in the pointers. – PaulMcKenzie Jun 23 '16 at 21:40
  • Have to do some diving into language lawyer space here, but does the specification go so far as to disallow using an allocator that's smart enough to reference count duplicated strings so long as it hides the fact from the user? – user4581301 Jun 23 '16 at 21:41
  • @PaulMcKenzie Sure, but my concern is with ownership of the original buffer, apart from mutability. – Jason Jun 23 '16 at 21:47
  • 2
    Pretty sure you're correct that they share a buffer, at least for GCC of the 4.8-4.9 variety, but I know if they aren't the same after you introduce a change. The standard Allocator's pretty bright. If I'm remembering correctly, before C++11 there is no assurance that `c_str` returns anything even resembling the internal representation. – user4581301 Jun 23 '16 at 21:51
  • @Jason There is nothing to worry about. [See this](http://godbolt.org/#compilers:!((compiler:g530,options:'',sourcez:MQSwdgxgNgrgJgUwAQB4IGcAucQHsB8AUKJLIqlgE7gDmRh0AhuukgPIBGAVoQN6EAHGByggIALkJJp7bgAoAlEnFJ0cgESYEAD0zqFUmfxknV2ceIHUwmAGYaApAIA6YdQBpVAOggB9KooKANyG0gC%2BhKFmcBZUtEg0CJgAKjqYiki8SJRJMJRgqkFIEVFYMeJxYDSFhGEhhOCYSAC2jOCKfFGcXABUSALdSAC8SGAIAO6yXIohJmWxmNbV6ACMcgPcALT4iSlpgbMy8xWL8egATMP93du7qbozUYhQScgbXIfSx1aN9upOrg8qhWPn8iwOpXMlmsdkcLjcnguoICCmCtSAAA%3D%3D)),filterAsm:(commentOnly:!t,directives:!t,labels:!t),version:3). The string is constant, the buffer is not going anywhere. – PaulMcKenzie Jun 24 '16 at 11:53

2 Answers2

10

This creates a second instance of the string object, but both string objects now point to the same char buffer on the heap.

No, they don't.

std::strings own their contents. When you copy a std::string, you copy its buffer.

I think a lot of people assume that, because the string was returned by value, they need not be concerned about the lifetime of the original string within Foo.

And those people are right. There is no "sharing".

Your return by value is fine and you needn't think more about it.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
8

I'd like to add some points to @LightnessRacesInOrbit's answer:

Never return string by value?

Never return local strings by reference or pointer. Actually, never return anything local by reference or pointer. By value is just fine.

Only return strings by value if the lifetime of the original string is guaranteed?

Again, you're thinking backwards. Only return by reference or pointer if the lifetime of the original is guaranteed.

Always make a copy of the string? return std::string(retval.c_str());

C++ does this automatically for you, if it can't move the string out (RVO/NRVO). No need to copy manually.

Enforce a contract with the caller of getText()?

Not needed as you get a copy anyway

I think a lot of people assume that, because the string was returned by value, they need not be concerned about the lifetime of the original string within Foo. This problem isn't strictly related to strings, but really applies to any class with encapsulated pointers that are free'd upon destruction.

And their assumption is correct. Any class with a (working) copy-constructor can be copied as often as needed and every copied instance is completely independent from the others. The copy-constructor takes care of copying the heap-space.

tkausl
  • 13,686
  • 2
  • 33
  • 50