4

If I define the following enums, Nil does not increase the size of the enum:

  use std::mem::size_of;

  enum Foo {
    Cons(~char)
  }

  enum Bar {
    Cons(~char),
    Nil
  }

  println!("{}", size_of::<Foo>());
  println!("{}", size_of::<Bar>());

  // -> 4
  // -> 4

On the other hand:

  enum Foo {
    Cons(char)
  }

  enum Foo {
    Cons(char),
    Nil
  }

Yields:

  // -> 4
  // -> 8

What is happening when I define an enum? How is memory being allocated for these structures?

o_o_o--
  • 975
  • 9
  • 20
  • 1
    probably because the first can be represented as a pointer to value and a null pointer (so its the size of a pointer), just guessing – Arjan Mar 23 '14 at 00:55
  • Have you checked the rust sources on github? That would give you their specific implementation. I don't specifically know, but that would my first place to check. – Brendan Lesniak Mar 23 '14 at 00:57

1 Answers1

10

A naive approach to enums is to allocate enough space for the contents of its largest variant, plus a descriminant. This is a standard tagged union.

Rust is a little cleverer than this. (It could be a lot cleverer, but it is not at present.) It knows that given a ~T, there is at least one value that that memory location cannot be: zero. And so in a case like your enum { Cons(~T), Nil }, it is able to optimise it down to one word, with any non-zero value in memory meaning Cons(~T) and a zero value in memory meaning Nil.

When you deal with char, that optimisation cannot occur: zero is a valid codepoint. As it happens, char is defined as being a Unicode code-point, so it would actually be possible to optimise the variant into that space, there being plenty of spare bits at the end (Unicode character only needs 21 bits, so in a 32-bit space we have eleven spare bits). This is a demonstration of the fact that Rust's enum discriminant optimisation is not especially clever at present.

Chris Morgan
  • 86,207
  • 24
  • 208
  • 215
  • 1
    This is a great answer. It's worth noting just for fun that in 2019 this problem no longer exists; I guess Rust's enum discriminant optimization got better in a few years. =) – River Tam Jul 29 '19 at 09:26
  • 1
    Nice! I didn’t know that Rust could now use invalid `char` values for the discriminant! For confirmation: `enum Bar2 { Cons(char), Nil } println!("{:x}", unsafe { std::mem::transmute::<_, u32>(Bar2::Nil) });` yields `110000`. Understand that this way: Unicode defines up to U+10FFFF, expressly declaring values higher than that forever invalid; so Rust considers anything past that point fair game for a discriminant value, and uses the next value. – Chris Morgan Jul 29 '19 at 14:21