Why does Rust use two bytes to represent this enum when only one is necessary?

Question

It appears to be smart enough to only use one byte for A, but not smart enough to use one byte for B, even though there are only 8*8=64 possibilities. Is there any way to coax Rust to figure this out or do I have to manually implement a more compact layout?

Playground link.

#![allow(dead_code)]

enum A {
    L,
    UL,
    U,
    UR,
    R,
    DR,
    D,
    DL,
}

enum B {
    C(A, A),
}

fn main() {
    println!("{:?}", std::mem::size_of::<A>()); // prints 1
    println!("{:?}", std::mem::size_of::<B>()); // prints 2
}

That's because rust's enums are the size of its largest variant. In this case, `A` is the size of a `u8`, and therefore there is _two_ bytes required to fit _two_ `A`s in `B`, as there is no compile-time micro optimizations like this. Anyway, what if the packed version of this was slower to use than the unpacked version? — Optimistic Peach, Feb 03 '19 at 01:42
@OptimisticPeach: it's certainly possible that it would be worse on some platforms/use-cases, but with memory latencies nowadays usually smaller data structures make up any unpacking time through having fewer cache misses. I am going to have fairly large vectors of these objects I'm going to be accessing semi-randomly, so cache misses are a concern for my use case. I'd be fine with something I have to opt into but that still saves me the work of manually doing the packing logic myself. — Joseph Garvin, Feb 03 '19 at 02:02
Rust can do enum layout optimizations in some more limited cases, see https://github.com/rust-lang/rust/pull/45225 for example — the8472, Feb 03 '19 at 03:48

score 16 · Accepted Answer · answered Feb 03 '19 at 02:00

Both bytes are necessary to preserve the ability to borrow struct members.

A type in Rust is not an ideal set of values: it has a data layout, which describe how the values are stored. One of the "rules" governing the language is that putting a type inside a struct or enum doesn't change its data layout: it has the same layout inside another type as it does standalone, which allows you to take references to struct members and use them interchangeably with any other reference.*

There's no way to fit two As into one byte while satisfying this constraint, because the size of A is one whole byte -- you can't address a part of a byte, even with repr(packed). The unused bits just remain unused (unless they can be repurposed to store the enum tag by niche-filling).

*Well, repr(packed) can actually make this untrue. Taking a reference to a packed field can cause undefined behavior, even in safe code!

I wonder if it's possible to have some sort of macro that would make a compact representation of B, that would involve generating multiple possible representations of A and implementing conversions for you to get the best of both worlds... — Joseph Garvin, Feb 03 '19 at 17:25

Why does Rust use two bytes to represent this enum when only one is necessary?

1 Answers1

Linked