2

I have data contained inside a Box, and would like to pattern match on it without accidentally copying the Box's contents from the heap to the stack; how do I do that?

Let's assume the following code:

enum SomeEnum {
    SomeEntry,
    AnotherEntry,
}

fn main() {
    let boxed_value = Box::new(SomeEnum::AnotherEntry);

    match *boxed_value {
        SomeEnum::SomeEntry => {}
        SomeEnum::AnotherEntry => {}
    }
}

Does this copy the enum out of the box onto the stack and pattern match on that copy, or does it do the matching directly on the value pointed to by the box?

What about this variant?

use std::ops::Deref;

enum SomeEnum {
    SomeEntry,
    AnotherEntry,
}

fn main() {
    let boxed_value = Box::new(SomeEnum::AnotherEntry);

    match boxed_value.deref() {
        SomeEnum::SomeEntry => {}
        SomeEnum::AnotherEntry => {}
    }
}

It seems that simply dereferencing a box does not automatically create a copy, otherwise one would not be able to create a reference to the contained value by using let x = &*boxed_value. This leads to a question about this syntax:

enum SomeEnum {
    SomeEntry,
    AnotherEntry,
}

fn main() {
    let boxed_value = Box::new(SomeEnum::AnotherEntry);

    match &*boxed_value {
        SomeEnum::SomeEntry => {}
        SomeEnum::AnotherEntry => {}
    }
}
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
soulsource
  • 341
  • 3
  • 12
  • Deref is explained [here](https://doc.rust-lang.org/std/ops/trait.Deref.html#tymethod.deref). As you can see its return a reference. – Stargateur Nov 03 '18 at 14:36
  • @Stargateur Unfortunately that confuses more than it helps. While `foo.deref()` returns a reference, `*foo` does not (necessarily). In fact, `*foo` desugars to `*(foo.deref())`, where the `*` in this case derefs the reference returned by `deref()`. See [this Q&A](https://stackoverflow.com/questions/31624743/why-is-the-return-type-of-derefderef-itself-a-reference). – Lukas Kalbertodt Nov 03 '18 at 14:39
  • @LukasKalbertodt Sometime I hate rust. `&*` this should not exist. – Stargateur Nov 03 '18 at 14:46
  • @Stargateur It's a unfortunate source of confusion, true. And I dislike the syntax, too. But there is not really another way in the current language. People expect `*foo` to not be a reference. So if you want a reference, an additional `&` is just required. And we can't just change the `Deref` trait because what would you return instead? By value doesn't work. If Rust could return lvalues somehow, yeah, then it might be possible. But that's a complex language features that's probably not very useful apart from this situation. Language design is handling trade-offs ;-) – Lukas Kalbertodt Nov 03 '18 at 14:54
  • @LukasKalbertodt What wrong with keep `Deref` as it is, and just `*` doesn't expend to `*(foo.deref())` but just as expected `foo.deref()`... This choice is very strange to me. I prefer `**` to get the value than `&*` to get a reference from a value from a reference. – Stargateur Nov 03 '18 at 15:01

1 Answers1

3

First: in Rust, there are no implicit costly copies, unlike in, for example, C++. Whereas in C++, the default action is "deep copy" (via copy constructor or similar), the default action in Rust is moving. A move is a shallow copy which (a) is usually very small and cheap and (b) can be removed by the optimizer in most cases. To get deep clones in Rust you have manually use .clone(). If you don't do that, you usually don't really have to worry about this.

Second: matching on an enum only looks at the discriminant of that enum (unless you bind enum fields, see below). That's the "tag" or the "metadata" which specifies which variant of the enum is stored in a value. That tag is tiny: it fits in 8 bits in almost all cases (enums with more than 256 variants are rare). So you don't need to worry about that. And in your case, we have a C-like enum without any fields. So the enum only stores the tag and hence is tiny, too.

So what about enum fields that might be costly to copy? Like this:

enum SomeEnum {
    SomeEntry(String),
    AnotherEntry,
}

let boxed_value = Box::new(SomeEnum::AnotherEntry);

match *boxed_value {
    SomeEnum::SomeEntry(s) => drop::<String>(s), // make sure we own the string
    SomeEnum::AnotherEntry => {},
}

So in this case one variant stores a String. Since deep-copying a string is somewhat costly, Rust won't do it implicitly. In the match arm we try to drop s and assert it's a String. That means we (meaning: the body of the match arm) own the string. So, if the match arm owns it but we didn't get the owned value from cloning it, that means that the outer function doesn't own it anymore. And in fact, if you try to use boxed_value after the match, you will get move errors from the compiler. So again, either you get a compiler error or no bad things automatically happen.

Furthermore, you can write SomeEnum::SomeEntry(ref s) in the match. In that case, the string is bound by reference to s (so the drop() call won't work anymore). In that case, we never move from boxed_value. This is something I call "deferred moving", but I'm not sure if that's an official term for it. But it just means: when pattern matching, the input value is not moved at all until a binding in the pattern moves from it.

Lastly, please take a look at this code and the generated assembly. The assembly is optimal. So once again: while you might be worried about accidental clones when you come from the C++ world, this is not really something you need to worry about in Rust.

Camelid
  • 1,535
  • 8
  • 21
Lukas Kalbertodt
  • 79,749
  • 26
  • 255
  • 305
  • I was of course assuming that in the real usage the enum holds (potentially lots of inline) data, and that using this data as function parameters in the match arms would cause the move of it onto the stack. Anyhow, my question is answered: I only have to be careful to use references inside the match arms to make sure only pointers get placed onto the stack. – soulsource Nov 03 '18 at 14:58
  • @soulsource You can afford to be less careful if you match against `&*boxed_value` instead of `*boxed_value`. Because of match ergonomics, described by Shepmaster's answer, when you match against a reference, bindings to internal values are auto-referenced, and the compiler will stop you if you try to move out of one. [Compare the code generated with and without the `&`](https://rust.godbolt.org/z/9CCIcE). – trent Nov 03 '18 at 15:26
  • @soulsource Also, it's tangential, but enums containing a lot of inline data can be bad for performance and memory usage, unless all the variants are roughly the same size. E.g., a value of type `Option<[u8; 1000]>` always consumes 1001 bytes even when it is `None`. You might consider `Box`ing some of that big data for this reason, which would make the cost of copying, if not unimportant, at least less important. – trent Nov 05 '18 at 14:54