3

Given the following (playground):

enum A { A = 0, B, C }
enum B { A = 1, B, C }

How does Rust represent the types A, Option<A>, B and Option<B> in memory? How is Rust able to have all four of these as single byte representations? It would make sense for Option<B> but I expected Option<A> to be two bytes.

puritii
  • 1,069
  • 1
  • 7
  • 18
  • And also, what tooling can I use to find this kind of information out? – puritii Oct 30 '20 at 23:56
  • 3
    I think you're assuming that the None case has to be all zeroes. You can disillusion yourself with a bit of unsafe memory transmutation seen [here](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=b5663d61945b6cfa73281dc0be254c9a). The None for Option is encoded as 3 and 4 respectively. – kmdreko Oct 31 '20 at 00:04
  • @kmdreko Yikes, that's exactly what I was looking for. Thanks! Perhaps turn into answer and I'll mark it. – puritii Oct 31 '20 at 00:35

1 Answers1

8

Disclaimer: There is no guarantee that the exact behavior seen here will always hold true. YOU SHOULD NOT RELY ON THE SPECIFIC VALUES. Also, this answer is only based on observations and will only address enums specifically; other memory optimizations exist surrounding &Ts, NonNull<T>s, NonZeroU8s (and family), and nested structures containing those.


Basic enums

If you have a simple no-nested-structure enum, the default behavior is that the enum variants start at zero and increment upwards. So the first non-representable bit-pattern is used as the None value:

enum Simple { A, B };
println!("{}", unsafe { transmute::<Option<Simple>, u8>(None) });
// prints 2

If your simple no-nested-structure enum leaves a gap at the front, the None value will still be represented by the first non-representable bit-pattern after the enum variant representations:

enum GapInFront { A = 1, B };
println!("{}", unsafe { transmute::<Option<GapInFront>, u8>(None) });
// prints 3

If you leave a gap at the front, and have a variant at the end of the bit-space, only then will it use all-zeros as the None value:

enum ExtendsToEnd { A = 1, B = 255 };
println!("{}", unsafe { transmute::<Option<ExtendsToEnd>, u8>(None) });
// prints 0

One thing to note, it will never pick a representation between variants for the None value. Even if there are plenty of non-representable bit-patterns, variants occupying the boundaries will cause it to use 2-bytes:

enum Full { A = 0, B = 255 };
println!("{:?}", unsafe { transmute::<Option<Full>, [u8; 2]>(None) });
// prints [0, 60], which I believe is undefined behavior 
// since I think the second byte is left uninitialized

My guess is that the compiler doesn't keep track of all representable bit-patterns, and instead only keeps ranges for doing these checks.


More interesting stuff.

If your enum has a variant with a nested enum value, it will take that into consideration as well:

enum Nested { A, B };
enum Complex { A(Nested), B };
println!("{}", unsafe { transmute::<Option<Complex>, u8>(None) });
// prints 3

However, it seems to break if two variants have values, even if their bit patterns don't overlap (sad):

enum Nested1 { A, B };
enum Nested2 { A=2, B };
enum MoreComplex { A(Nested1), B(Nested2) };
println!("{:?}", unsafe { transmute::<Option<MoreComplex>, [u8; 2]>(None) });
// prints [2, 211], again the second byte is left uninitialized

Another thing to point out, this isn't a special case with Option; if you define your own option type, it behaves the same:

enum MyOption<T> { None, Some(T) };
println!("{}", unsafe { transmute::<MyOption<Simple>, u8>(MyOption::None) });
// prints 2

See all this on the playground.

See also What is the overhead of Rust's Option type?.

kmdreko
  • 42,554
  • 6
  • 57
  • 106
  • 1
    “However, it seems to break if two variants have values” it has to, because you must be able to get references to the inner values, so they must be represented exactly as the corresponding type, and the compiler isn't free to chose the value like in `Complex::B`. – mcarton Oct 31 '20 at 10:40
  • @mcarton, can you elaborate? I think I've built `Nested1` and `Nested2` such that they *could* both be represented in a single byte `MoreComplex` enum without modification. – kmdreko Oct 31 '20 at 15:54
  • 1
    Oh right, then it's probably just that no-one bothered implementing that optimization. It seems pretty rare that you'd have the right conditions for it. – mcarton Oct 31 '20 at 18:17