1

How is an Option laid out in memory? Since a i32 already takes up an even number of bytes, it Rust forced to use a full byte to store the single bit None/Some?

EDIT: According to this answer, Rust in fact uses an extra 4 (!) bytes. Why?

Test
  • 962
  • 9
  • 26
  • 1
    Per this [question](https://stackoverflow.com/questions/16504643/what-is-the-overhead-of-rusts-option-type), storing an i32 takes 4 bytes, and storing Option takes 8 bytes. – Nick ODell Apr 02 '22 at 02:20
  • I'm guessing that Rust currently always places the discriminant before the data in an enum, and alignment requires using `sizeof(T)` bytes to store the discriminant. If Rust could place the discriminant after the data then it could probably make due with just one extra byte. – BallpointBen Apr 02 '22 at 03:35
  • 1
    @BallpointBen I don't think that would make any difference, how would an array of `Option` work if each of the options are 5 bytes long? They would still align to 8 bytes. – Ekrem Dinçel Apr 02 '22 at 05:22
  • 1
    @BallpointBen Imagine an array. If the first element was aligned on 4 bytes but was 5 bytes long, the next element would have to skip 3 bytes to be aligned. Finally, an element take 8 bytes. Only when there is an alternance of many long and short members in a structure, the ordering changes the overall size. – prog-fh Apr 02 '22 at 07:38

1 Answers1

2

For structs and enums declared without special layout modifiers, the Rust docs state

Nominal types without a repr attribute have the default representation. Informally, this representation is also called the rust representation.

There are no guarantees of data layout made by this representation.

Option cannot possibly be repr(transparent) or repr(i*) since it is neither a newtype struct nor a fieldless enum, and we can check the source code and see that it's not declared repr(C). So no guarantees are made about the layout.

If it were declared repr(C), then we'd get the C representation, which is what you're envisioning. We need one integer to indicate whether it's None or Some (which size of integer is implementation-defined) and then we need enough space to store the actual i32.

In reality, since Rust is given a lot of leeway here, it can do clever things. If you have a variable which is only ever Some, it needn't store the tag bit (and, again, no guarantees are made about layout, so it's free to make this change internally). If you have an i32 that starts at 0 and goes up to 10, it's provably never negative, so Rust might choose to use, say, -1 to indicate None.

Silvio Mayolo
  • 62,821
  • 6
  • 74
  • 116
  • The comment in the question indicated that it is *usually* 8 bytes. Do you mind including an explanation why it is an extra 4 bytes? – Test Apr 02 '22 at 02:40
  • 3
    @Test At the end of the day, the datatype is something like `(i32, u8)` where the `u8` stores which enum variant is active (i.e., it is the 'discriminant'). Padding rounds this up. Note that Rust _is_ smart about reusing this discriminant, so if you have `Option – GManNickG Apr 02 '22 at 05:21
  • `repr(C)` doesn't help because C has the same alignment requirement as Rust. The last paragraph should make it clear that it is only theoretical, not something that the compiler actually does (or is planned to do in the future). – user4815162342 Apr 02 '22 at 07:04