4

Why does the following program print garbage instead of hello? Interestingly, if I replace hello with hello how are you, then it prints hello how are you.

#include <string>
#include <iostream>

class Buffer
{
public:
    Buffer(std::string s):
        _raw(const_cast<char*>(s.data())),
        _buffer(std::move(s))
    {    
    }

    void Print()
    {
        std::cout << _raw;
    }

private:
    char* _raw;
    std::string _buffer;
};    

int main()
{   
    Buffer b("hello");
    b.Print();
}
tcb
  • 4,408
  • 5
  • 34
  • 51
  • seems fine here http://ideone.com/HoV7af – Ankur Jan 01 '15 at 18:49
  • 6
    Looks like you've hit undefined behavior (dangling pointers respectively). – πάντα ῥεῖ Jan 01 '15 at 18:49
  • 4
    The effect you're observing is related to *small string optimization*, which stores small strings in the `string` object itself, rather than on the heap. (The behaviour is undefined formally, though.) If you move a small string, the characters are *copied* from the source `string` object to the target `string` object, and the source characters are "cleared". For large strings, a pointer is moved from the source to the target string, while the characters themselves are not changed. – dyp Jan 01 '15 at 18:51
  • Why would you cast this `const`ness away? – Lightness Races in Orbit Jan 01 '15 at 18:55
  • 1
    There was a compilation error without the `const_cast` – tcb Jan 01 '15 at 18:56
  • 4
    @tcb A cast is to express something inherently dangerous. – dyp Jan 01 '15 at 19:00
  • 3
    @tcb: Then you should have _stopped_, and considered _why_ what you were attempting to do caused an error. Then _fixed_ the problem (by storing a `const char*` instead). Why would your response to that scenario be "oh okay then I shall simply silence the error with hackery"?? – Lightness Races in Orbit Jan 01 '15 at 19:12
  • 1
    Agreed, I should have simply made `_raw` `const char*`. Learning slowly.. – tcb Jan 01 '15 at 19:22

3 Answers3

11

From your question, you imply a class invariant of Buffer. A class invariant is a relationship between the data members of a class that is assumed to always be true. In your case the implied invariant is:

assert(_raw == _buffer.data());

Joachim Pileborg correctly describes why this invariant is not maintained in your Buffer(std::string s) constructor (upvoted).

It turns out that maintaining this invariant is surprisingly tricky. Therefore my very first recommendation is that you redesign Buffer such that this invariant is no longer needed. The simplest way to do that is to compute _raw on the fly whenever you need it, instead of storing it. For example:

void Print()
{
    std::cout << _buffer.data();
}

That being said, if you really need to store _raw and maintain this invariant:

assert(_raw == _buffer.data());

The following is the path you will need to go down...

Buffer(std::string s)
    : _buffer(std::move(s))
    , _raw(const_cast<char*>(_buffer.data()))
{
}

Reorder your initialization such that you first construct _buffer by moving into it, and then point into _buffer. Do not point into the local s which will be destructed as soon as this constructor completes.

A very subtle point here is that despite the fact that I have reordered the initialization list in the constructor, I have not yet actually reordered the actual construction. To do that, I must reorder the list of data member declarations:

private:
    std::string _buffer;
    char* _raw;

It is this order, and not the order of the initialization list in the constructor that determines which member is constructed first. Some compilers with some warnings enabled will warn you if you attempt to order your constructor initialization list differently than the order the members will actually be constructed.

Now your program will run as expected, for any string input. However we are just getting started. Buffer is still buggy as your invariant is still not maintained. The best way to demonstrate this is to assert your invariant in ~Buffer():

~Buffer()
{
    assert(_raw == _buffer.data());
}

As it stands (and without the user-declared ~Buffer() I just recommended), the compiler helpfully supplies you with four more signatures:

Buffer(const Buffer&) = default;
Buffer& operator=(const Buffer&) = default;
Buffer(Buffer&&) = default;
Buffer& operator=(Buffer&&) = default;

And the compiler breaks your invariant for every one of these signatures. If you add ~Buffer() as I suggested, the compiler will not supply the move members, but it will still supply the copy members, and still get them wrong (though that behavior has been deprecated). And even if the destructor did inhibit the copy members (as it might in a future standard), the code is still dangerous as under maintenance someone might "optimize" your code like so:

#ifndef NDEBUG
    ~Buffer()
    {
        assert(_raw == _buffer.data());
    }
#endif

in which case the compiler would supply the buggy copy and move members in release mode.

To fix the code you must re-establish your class invariant every time _buffer is constructed, or outstanding pointers into it might be invalidated. For example:

Buffer(const Buffer& b)
    : _buffer(b._buffer)
    , _raw(const_cast<char*>(_buffer.data()))
{
}

Buffer& operator=(const Buffer& b)
{
    if (this != &b)
    {
        _buffer = b._buffer;
        _raw = const_cast<char*>(_buffer.data());
    }
    return *this;
}

If you add any members in the future which have the potential for invalidating _buffer.data(), you must remember to reset _raw. For example a set_string(std::string) member function would need this treatment.

Though you did not directly ask, your question alludes to a very important point in class design: Be aware of your class invariants, and what it takes to maintain them. Corollary: Minimize the number of invariants you have to manually maintain. And test that your invariants actually are maintained.

Community
  • 1
  • 1
Howard Hinnant
  • 206,506
  • 52
  • 449
  • 577
7

The constructor takes its argument by value, and when the constructor returns that argument goes out of scope and the object s is destructed.

But you save a pointer to the data of that object, and once the object is destructed that pointer is no longer valid, leaving you with a stray pointer and undefined behavior when you dereference the pointer.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • 2
    Does the same happen when you use `std::vector`? I think for that class, the behaviour is well-defined (for `std::allocator`) - references and pointers to the original data remain valid after moving. So the answer must related specifically to `std::string` IMHO. – dyp Jan 01 '15 at 18:54
  • @dyp To be honest, the more I look at the code, the more unsure I get. Is moving from `s` guaranteed to keep the actual pointer intact (i.e. only copy/move the pointer)? I don't know, but I doubt it's well-defined. – Some programmer dude Jan 01 '15 at 18:56
  • @dyp It's well-defined for swap. I'm not sure if it's well-defined for move. – T.C. Jan 01 '15 at 18:58
  • 2
    @JoachimPileborg There is no such guarantee for `std::string`. Its copy and move ctor are specified as if a *copy* of the characters is made. Reusing the storage is merely an optimization, QoI. – dyp Jan 01 '15 at 19:04
  • @dyp what if `std::string` uses SSO? Is a pointer to `std::string::data` guaranteed to remain valid after moving? – Matthias Aug 02 '21 at 12:47
  • 1
    @Matthias I think SSO is the reason why `std::string` doesn't provide strong guarantees wrt pointer invalidation for move operations. – dyp Aug 04 '21 at 07:51
1
Buffer b("hello");

This is creating temporary string to pass to constructor. When that string goes out of scope at the end of your constructor, you are left with dangling _raw.

That means an undefined behavior as when you call Print _raw is pointing to de-allocated memory.

ravi
  • 10,994
  • 1
  • 18
  • 36