What are the performance implications of consuming self and returning it?

Question

I've been reading questions like Why does a function that accepts a Box<MyType> complain of a value being moved when a function that accepts self works?, Preferable pattern for getting around the "moving out of borrowed self" checker, and How to capture self consuming variable in a struct?, and now I'm curious about the performance characteristics of consuming self but possibly returning it to the caller.

To make a simpler example, imagine I want to make a collection type that's guaranteed to be non-empty. To achieve this, the "remove" operation needs to consume the collection and optionally return itself.

struct NonEmptyCollection { ... }

impl NonEmptyCollection {
    fn pop(mut self) -> Option<Self> {
        if self.len() == 1 {
            None
        } else {
            // really remove the element here
            Some(self)
        }
    }
}

(I suppose it should return the value it removed from the list too, but it's just an example.) Now let's say I call this function:

let mut c = NonEmptyCollection::new(...);
if let Some(new_c) = c.pop() {
    c = new_c
} else {
    // never use c again
}

What actually happens to the memory of the object? What if I have some code like:

let mut opt: Option<NonEmptyCollection> = Some(NonEmptyCollection::new(...));
opt = opt.take().pop();

The function's signature can't guarantee that the returned object is actually the same one, so what optimizations are possible? Does something like the C++ return value optimization apply, allowing the returned object to be "constructed" in the same memory it was in before? If I have the choice between an interface like the above, and an interface where the caller has to deal with the lifetime:

enum PopResult {
    StillValid,
    Dead
};

impl NonEmptyCollection {
    fn pop(&mut self) -> PopResult {
        // really remove the element
        if self.len() == 0 { PopResult::Dead } else { PopResult::StillValid }
    }
}

is there ever a reason to choose this dirtier interface for performance reasons? In the answer to the second example I linked, trentcl recommends storing Options in a data structure to allow the caller to do a change in-place instead of doing remove followed by insert every time. Would this dirty interface be a faster alternative?

There's nothing special about `self` as compared to *any other variable of any type*. — Shepmaster, Jan 12 '18 at 13:56

score 3 · Accepted Answer · answered Jan 12 '18 at 12:13

3

YMMV

Depending on the optimizer's whim, you may end up with:

close to a no-op,
a few register moves,
a number of bit-copies.

This will depend whether:

the call is inlined, or not,
the caller re-assigns to the original variable or creates a fresh variable (and how well LLVM handles reusing dead space),
the size_of::<Self>().

The only guarantees you get is that no deep-copy will occur, as there is no .clone() call.

For anything else, you need to check the LLVM IR or assembly.

answered Jan 12 '18 at 12:13

Matthieu M.

287,565
48
449
722

Obviously I'm looking for a steer, not a guarantee. In a particular case I'd profile; I'm just trying to build an intuition of when it's worth worrying about. – Dan Hulme Jan 12 '18 at 13:32
@DanHulme: In that case, I would say that it depends on `size_of`. For a small size (a couple words), the cost of the copy is so small even if not optimized that it's not worrying. For a larger size, such as when an array is embedded, I'd be conservative if performance is expected and go with references. – Matthieu M. Jan 12 '18 at 13:41
@DanHulme conversely, I advocate that *moving* is always acceptable. Rust's semantics hinge on the concept of moving — it's even on the front page of rust-lang.org! Thus any "extra" moves are a big deal and should be fixed at the root. – Shepmaster Jan 12 '18 at 13:55
@Shepmaster: If your API forces moves, and functions are not inlined, you will have moves in the final binary and that's not Rust's fault. – Matthieu M. Jan 12 '18 at 14:31
@MatthieuM. the compiler is capable of converting functions taking moves into assembly that takes mutable references, inlining is orthogonal. – Shepmaster Jan 12 '18 at 14:35
@Shepmaster: Yes but... imagine a simple `fn move(t: T) -> T { t }`, sure the compiler can (a) take the input per pointer and (b) take a pointer where to locate the output... but I doubt it can force the caller to use a single pointer for both, and thus there'll be a move of `T` from `*a` to `*b`. – Matthieu M. Jan 12 '18 at 14:43
This is what I was trying to get at with the question. I don't really understand yet when a "move" in Rust semantics is an actual (shallow) copy, and when it isn't. The C++ programmer in me wants to know, but C++ intuitions don't help here. – Dan Hulme Jan 12 '18 at 14:57
@DanHulme: It is *always* shallow, as it is a `memcpy` of the `struct` or `enum` bits... however, a shallow copy if `[i32; 2048]` still has to copy 8192 bytes (8KB)! – Matthieu M. Jan 12 '18 at 15:08
@MatthieuM. Sorry, I was unclear; I meant to know whether it's a shallow copy, and when the copy can be elided completely. In the case I'm working on right now, the object will be 4k, so the cost of copying is not trivial. – Dan Hulme Jan 12 '18 at 15:13
1

@DanHulme: If performance matters, I prefer starting from the assumption that the compiler will not be able to elide the move, since there's no guarantee. – Matthieu M. Jan 12 '18 at 15:17

What are the performance implications of consuming self and returning it?

1 Answers1