1

Using the following snippet

use std::mem;

fn main() {
   println!("size Option(bool): {} ({})", mem::size_of::<Option<bool>>(), mem::size_of::<bool>());
   println!("size Option(u8): {} ({})", mem::size_of::<Option<u8>>(), mem::size_of::<u8>());
   println!("size Option(u16): {} ({})", mem::size_of::<Option<u16>>(), mem::size_of::<u16>());
   println!("size Option(u32): {} ({})", mem::size_of::<Option<u32>>(), mem::size_of::<u32>());
   println!("size Option(u64): {} ({})", mem::size_of::<Option<u64>>(), mem::size_of::<u64>());
   println!("size Option(u128): {} ({})", mem::size_of::<Option<u128>>(), mem::size_of::<u128>())
}

I see on my 64-bits machine:

size Option(bool): 1 (1)
size Option(u8): 2 (1)
size Option(u16): 4 (2)
size Option(u32): 8 (4)
size Option(u64): 16 (8)
size Option(u128): 24 (16)

So the overhead is not constant and goes up to 8 bytes. I wonder why the overhead is not just one byte to store the tag? I also wonder what representation is chosen by the compiler?

Saroupille
  • 609
  • 8
  • 14
  • Why is it the case, and what is the representation chosen by the Rust compiler? – Saroupille Mar 19 '23 at 15:51
  • it's not memory overhead since it's use what is strictly needed to have the feature. You can't do better at hand hens it's zero cost. Also octet > byte (yes I'm french). – Stargateur Mar 19 '23 at 18:14

2 Answers2

5

The Rust Reference on type layouts comes into play here:

[...] The size of a value is always a multiple of its alignment. [...]

The only data layout guarantees made by [the default] representation are those required for soundness. They are: [...]

  1. The alignment of the type is at least the maximum alignment of its fields.

So the size of Option<T> must be rounded up to the nearest alignment of T, even if only one byte (or even one bit) is used to store the information of "value is present".

The exception is types that allow for "null pointer optimization", where Option<T> has the same size of T because it can represent None by using one of the invalid states of T. For example, bool only has two states, so the compiler will optimize and use one of the remaining 254 1-byte states to represent None for Option<bool>. This works for bool, &U, &mut U, fn, Box<U>, NonZero* and NonNull<U>.

Frxstrem
  • 38,761
  • 9
  • 79
  • 119
  • More general than NPO, the documentation section on [discriminant elision](https://rust-lang.github.io/unsafe-code-guidelines/layout/enums.html#discriminant-elision-on-option-like-enums) is useful. – cdhowie Mar 19 '23 at 22:35
4

Note: None of the below is guaranteed, it just happens to be the case today.


Generally, yes, we store the tag in one byte. This is why for u8 the size is two bytes.

However, there are two other considerations here:

  1. The data needs to be adequately aligned and the size needs to be a multiple of the alignment (this rule has to hold for all Rust types). If we put the tag first and the data after (this is the case currently), we need padding after (or before) the tag so the data will be properly aligned. If we put the data first and the tag after, we need padding so the size is a multiple of the alignment (the alignment is the same as the data's alignment) for it to be properly aligned. Either way, we need padding, and the size of padding + the tag is the same as the alignment of the data. u16 has alignment of two bytes, u32 of four, and u64 and u128 of eight. This is the reason for their size.

  2. When we can, we prefer to pack the data and tag together, to save memory. bool has a niche, meaning it has some invalid values (all values except 0 and 1). So for Some we store the value directly, and for None we use one of the invalid values.

Peter Hall
  • 53,120
  • 14
  • 139
  • 204
Chayim Friedman
  • 47,971
  • 5
  • 48
  • 77