21

I would like to define a class for marshalling data; when marshalling is finished, I would like to move the marshalled data out from within it, which will probably invalidate the marshalling object.

I believe this is possible with the static function extractData below:

class Marshaller
{
  public:
    static DataType extractData(Marshaller&& marshaller)
    {
      return std::move(marshaller.data);
    }
  private:
    DataType data;
}

This is a bit inconvenient to call, though:

Marshaller marshaller;
// ... do some marshalling...
DataType marshalled_data{Marshaller::extractData(std::move(marshaller))};

So can I wrap it with a member function?

DataType Marshaller::toDataType()
{
  return Marshaller::extractData(std::move(*this));
}

This would, of course, be called using:

DataType marshalled_data{marshaller.toDataType()};

...which, to me, looks much nicer. But that std::move(*this) thing looks awfully suspicious. In the context of the call to toDataType(), marshaller can't be used again, but I don't think the compiler can know that: the body of the function could be outside the caller's compilation unit, so there's nothing to indicate that marshaller has had move() applied to it.

Is this undefined behavior? Is it perfectly fine? Or somewhere in between? Is there a nicer way to accomplish the same goal, preferably without using a macro or requiring the caller to explicitly move marshaller?

EDIT: With both G++ and Clang++, I found that not only could I compile the above use case, but I could actually continue to make modifications to the underlying data via the marshaller, then re-extract the modified data using the toDataType function. I also found that the already-extracted data in marshalled_data continued to be changed by marshaller, which indicates that the marshalled_data is shared between the marshaller and the calling context, so I suspect that there is either a memory-leak or undefined behavior (from double-deletion) here.

EDIT 2: If I put a print statement in DataType's destructor, it appears twice when the caller leaves scope. If I include a data member in DataType that has an array in it, with a corresponding new[] and delete[], I get a glibc "double free or corruption" error. So I'm not sure how this could be safe, even though several answers have said that it's technically allowed. A complete answer should explain what is required to use this technique correctly with a non-trivial DataType class.

EDIT 3: This is enough of a rabbit-hole/can-of-worms that I've opened up another question to address my remaining concerns.

Community
  • 1
  • 1
Kyle Strand
  • 15,941
  • 8
  • 72
  • 167
  • What happened when you tried it? – Steve Apr 08 '15 at 01:00
  • @Steve Working on that....... – Kyle Strand Apr 08 '15 at 01:02
  • 4
    Assuming that two `marshalled_data` objects are not supposed to refer to the same data (because you seem surprised that they do): If the `marshalled_data` is shared, then it sounds like the underlying `DataType` has a bug in its copying move constructor. For example, maybe it doesn't have one at all, and the default copy constructor is doing a shallow copy. – Raymond Chen Apr 08 '15 at 01:18
  • Did you intend to declare `extractData` as `static` ? – Nemo Apr 08 '15 at 01:19
  • @Nemo Yes. Otherwise you would need two `Marshaller`s to make the call to `extractData`, since the argument must be an r-value reference. – Kyle Strand Apr 08 '15 at 01:40
  • @KyleStrand: I understand, but your code snippet does not include the word `static` – Nemo Apr 08 '15 at 01:44
  • 4
    Move semantics is **not** destructive. That is, the destructor of an object will be called whether you have moved from the object or not. `std::move` is nothing more than a simple cast that changes the value category of an expression. It does not *do* anything itself. The semantics of move operations of a class type are entirely defined by the class type. – dyp Apr 08 '15 at 01:46
  • @Nemo Ah, thanks for catching that; I have it in my real code but must have forgotten it when typing up the question here. – Kyle Strand Apr 08 '15 at 01:51
  • 2
    By the way, your `extractData` reminds me of `unique_ptr::release`, or `thread::detach` or similar functions that release a resource from its (former) manager. You don't even need to bother with move semantics here; simply point out clearly (via the name of the function and its spec/description) that it will alter / empty the resource manager. – dyp Apr 08 '15 at 01:51
  • @dyp But transferring r-value references around *does* transfer control, right? So `extract`ing the data from the object would, I expect, cause the calling context to take ownership, but in that case the object's destructor would cause the data to be destroyed twice. But according to MarkB's answer, this code is valid, so that must mean that either the calling context *doesn't* actually own the `marshalled_data`, or the object's destructor behaves differently, or........something else? Am I just completely barking up the wrong tree here? – Kyle Strand Apr 08 '15 at 01:55
  • 3
    Rvalue references are still references like lvalue references. Binding to them does not affect the bound object. The difference between lvalue and rvalue references is what kinds of *expressions* (value categories) you can bind to them. But `std::move` and some similar cast can change the value category of an expression. So all we have are *conventions*. An rvalue reference *per convention* is assumed to refer to an object that we can steal the resources from. Destruction of that object is not affected by "stealing resources". – dyp Apr 08 '15 at 02:24
  • 1
    Some flavours of the copy-and-swap idiom do this indirectly: if they write `swap(*this, other)` without a custom swap, then it will use the C++11 default implementation of `std::swap` which is three calls to `std::move` – M.M Apr 11 '15 at 00:09

5 Answers5

11

According to the standard the move-from object will still be valid although its state is not guaranteed, so it seems that moving from *this would be perfectly valid. Whether it's confusing to users of your code is another question entirely.

All that said it sounds like your real intention is to link the destruction of the marshallar with the extraction of the data. Did you consider doing all the marshalling within a single expression and letting a temporary take care of things for you?

class Marshaller
{
  public:
    Marshaller& operator()(input_data data) { marshall(data); return *this; }
    DataType operator()() { return std::move(data_); }
  private:
    DataType data_;
}

DataType my_result = Marshaller()(data1)(data2)(data3)();
Mark B
  • 95,107
  • 10
  • 109
  • 188
  • Which context controls the destruction of the `move`d marshaller object and the `move`d underlying data, then? Are they both guaranteed to be destroyed at the end of the calling context and not before? – Kyle Strand Apr 08 '15 at 01:08
  • 1
    @KyleStrand The `move`d marshaller object and the `move`d underlying data are destroyed according to the normal rules as if they had never been `move`d. After you `move` out of the marshaller object, it cannot be used for anything any more, and callers need to understand that. – Raymond Chen Apr 08 '15 at 01:12
  • @RaymondChen The calling context has, in effect, a reference to the underlying data following the `move`--which, under normal rules, would be destroyed at the end of the scope, *in addition to* the marshaller object, whose destructor would *also* try to destroy the underlying data. Destroying a data member twice is undefined behavior (right?). So does the compiler actually know to avoid this, or not? – Kyle Strand Apr 08 '15 at 02:06
  • 1
    @KyleStrand As I already noted, the compiler destroys the objects at the usual times. It is the copy move constructor's job to make sure that the destruction does not create problems. From your description, it appears that the copy move constructor for `DataType` is flawed. – Raymond Chen Apr 08 '15 at 14:09
  • @RaymondChen I'm not actually seeing any bad behavior; I'm just trying to understand the concept of how move-construction affects destruction. If `DataType` were, say, `char*`, and it were `move`d as the return value from the function that calls `toDataType`, what would prevent the `marshaller` destructor from destroying the `char*`? – Kyle Strand Apr 08 '15 at 16:13
  • @Kyle Strand If you've moved from a `marshaller` and the `marshaller`'s destructor destroys/frees the resources that it owned being movement, that's a bug in `marshaller` then. – Mark B Apr 08 '15 at 16:17
  • Huh, that single-expression marshalling is an interesting idea. I have some if-statements and stuff in the marshalling logic, though, and I'd rather just extract/convert the data at the moment I know I'm done with the marshaller. As for there being a bug in marshaller, I'm not defining an explicit destructor, and I wouldn't expect to need to; but the default constructor, AFAIK, should call the destructor of each data member, and `data_` is a member of the `Marshaller` class, so...again, what prevents `data_`'s destructor from being called twice? – Kyle Strand Apr 08 '15 at 18:06
  • 2
    "According to the standard the move-from object will still be valid although its state is not guaranteed, so it seems that moving from `*this` would be perfectly valid." The standard only defines the state of moved-from standard library objects. *We* define (or don't) the state of our moved-from objects. It's a good idea in general to follow the standard library's rule, since that expectation is ingrained in most of us, but it's neither required nor guaranteed. – Casey Apr 08 '15 at 23:09
  • @KyleStrand T"What would prevent the marshaller destructor from destroying the `char*`?" Nothing. it is up to the marshaller's destructor to decide what it does when it is destroyed. And it is up to the marshaller's move constructor to decide what state to leave the marshaller in after its contents have been moved out. In a well-written program, the two pieces will work coherently so that the move constructor will leave the object in a state such that the destructor won't do anything bad. – Raymond Chen Apr 09 '15 at 01:19
  • @RaymondChen How can the marshaller's move constructor be affecting anything here? Where is it being called? – Kyle Strand Apr 10 '15 at 00:48
  • You're right. It's DataType's copy move constructor that is the important one here. – Raymond Chen Apr 12 '15 at 20:29
9

I would avoid moving from *this, but if you do it, at least you should add rvalue ref-qualifier to the function:

DataType Marshaller::toDataType() &&
{
    return Marshaller::extractData(std::move(*this));
}

This way, the user will have to call it like this:

// explicit move, so the user is aware that the marshaller is no longer usable
Marshaller marshaller;
DataType marshalled_data{std::move(marshaller).toDataType()};

// or it can be called for a temporary marshaller returned from some function
Marshaller getMarshaller() {...}
DataType marshalled_data{getMarshaller().toDataType()};
Anton Savin
  • 40,838
  • 8
  • 54
  • 90
6

There is nothing inherently unsafe about calling move(*this). The move is essentially just a hint to a function being called that it may steal the internals of the object. In the type system, this promise is expressed through && references.

This is not related to destruction in any way. The move does not perform any type of destruction - as mentioned, it just enables us to call functions taking && parameters. The function receiving the moved object (extractData in this case) also does not do any destruction. In fact, it needs to leave the object in a "valid but unspecified state". Essentially, this means that it must be possible to destroy the object in the normal way (by delete or by going out of scope, depending on how it was created).

So - provided your extractData does what it should and leaves the object in a state that allows it to be destructed later - there is nothing undefined or dangerous going on with respect to the compiler. There could of course be a problem with users of the code being confused, since it is not entirely obvious that the object is being moved from (and likely won't contain any data later). This could perhaps be made a little clearer by changing the function name. Or (as another answer suggested) by &&-qualifying the entire method.

CAdaker
  • 14,385
  • 3
  • 30
  • 32
  • *"In fact, it needs to leave the object in a "valid but unspecified state""* How/why is this required? – dyp Apr 08 '15 at 12:44
  • The object will be destructed by whatever process would usually destroy it (`delete` or end-of-scope etc.). That operation must still be possible, which is what the "valid state" part refers to. This is the only thing the language demands - destructability must be preserved. The "unspecified state" part is simply the idiomatic agreement that the function may do whatever it wants to the internals of the object. Most often this means remove the data and store it somewhere else, but it can do anything. The language doesn't put any special restrictions on it. – CAdaker Apr 08 '15 at 13:01
  • Ah, alright (I've already upvoted anyway). I thought you were referring to the Standard Library's guarantee in [lib.types.movedfrom]. The "valid" part as far as I know refers to the class invariants. Not all of them have to be maintained to keep the object destructible (it is sufficient but not necessary). – dyp Apr 08 '15 at 13:05
  • Ah, right. I was thinking about it strictly from a language specification viewpoint and maybe I misused the phrase. But indeed, the user-specified semantics of the actual move operation can have further constraints on the state of the moved-from object. And preserving invariants (like the standard library does) is certainly good practice. – CAdaker Apr 08 '15 at 14:30
  • I'm marking this as the accepted answer for now, because it's fairly clear and doesn't assume that this is an XY problem. I'd prefer more clarity regarding what it would mean for `extractData` to "[do] what it should", but that may be better addressed in my [new question](http://stackoverflow.com/q/29614410/1858225). – Kyle Strand Apr 13 '15 at 20:27
3

I think you shouldn't move from *this, but from its data field. Since this clearly will leave the Marshaller object in an valid but unusable state, the member function that does this should itself should have an rvalue reference qualifier on its implicit *this argument.

class Marshaller
{
public:
  ...
  DataType Marshaller::unwrap() &&   { return std::move(data); }

  ...
private:
  DataType data;
};

Call it, if m is Marshaller variable, as std::move(m).unwrap(). There is not need for any static member to accomplish this.

Marc van Leeuwen
  • 3,605
  • 23
  • 38
2

You write that you'd like to simultaneously destroy the Marshaller and remove the data from it. I really wouldn't worry about trying to do these things simultaneously, just move the data out first and then destroy the Marshaller object. There are a number of ways to get rid of the Marshaller without thinking much about it, perhaps a smart pointer makes sense for you?

One option for refactoring this would be to give DataType a constructor that takes a Marshaller and moves the data out (the 'friend' keyword will allow you to do this since the DataType will then be able to reach that private 'data' variable).

    //add this line to the Marshaller
    friend class DataType;

struct DataType
{
    DataType(Marshaller& marshaller) {
             buffer = marshaller.data.buffer;
        }

    private: 
        Type_of_buffer buffer;//buffer still needs to know how to have data moved into it
}

You could also give it an assignment operator that does that same (I think this will just work:

DataType& operator=(Marshaller&& marshaller) {
     this.buffer = std::move(marshaller.data.buffer);
     return *this;
}

)

I would avoid using move on *this, just because it's going to throw people off even if it is right. It also seems like stack based buffer containers might get you into trouble.

You seem to be concerned with the Marshaller being called again outside the compilation unit. If you have intense parallel code and are playing fast and loose with the marshaller, or you are copying pointers to your marshaller willy-nilly then I think your worries are justified. Otherwise, look at how the Marshaller is being moved around and make sure you've structured your code for good object lifetime (use object references whenever you can). You can also just add a member flag to marshaller that says whether 'data' has been moved and throw an error if someone tries to access it after it's left (if you're parallel be sure to lock). I would only do this as a last resort or quick fix since it doesn't seem right and your co-devs will wonder what's up.

I have some nits to pick if you have a moment:

  • your extractData method is missing the static keyword
  • you mix brackets and parenthesis on your DataType declartion line
Kyle Strand
  • 15,941
  • 8
  • 72
  • 167
Jacob Statnekov
  • 245
  • 3
  • 12
  • For *unmarshalling*, I'm actually doing something similar to what you suggest (most of my data types that are being unmarshalled have a constructor that take an "unmarshaller" object). But my particular "DataType" is actually Qt's `QByteArray`, which isn't under my control for modification. I suppose I could still define the assignment operator, but at that point I'm just doing the same thing that I'm doing with `toDataType` but less explicitly (since it's somewhat surprising for `operator=` to require a `move` on its right operand). – Kyle Strand Apr 08 '15 at 01:59
  • As for concern with the Marshaller being called in an unexpected way, I'm actually just wondering about how the standard handles a case like this; I'm not planning to write anything particularly strange that would actually cause the marshaller to behave unexpectedly. (My marshallers have pretty short lifetimes.) – Kyle Strand Apr 08 '15 at 02:03
  • Thanks for noticing the mixed braces/parens. The `static` typo was already caught by a couple other people. (This is what I get for trying to make my code "generic" for the sake of posting here...) – Kyle Strand Apr 08 '15 at 02:03
  • Another way to design this is to have the Marshaller take a reference to an instance of DataType and simply "fill it in". To my eye that is simpler than mucking about with rvalue references and move semantics – Nemo Apr 08 '15 at 03:15