52

In Learning Rust With Entirely Too Many Linked Lists, the author mentions:

However, if we have a special kind of enum:

enum Foo {
    A,
    B(ContainsANonNullPtr),
}

the null pointer optimization kicks in, which eliminates the space needed for the tag. If the variant is A, the whole enum is set to all 0's. Otherwise, the variant is B. This works because B can never be all 0's, since it contains a non-zero pointer.

I guess that the author is saying that (assuming A is 4 bits, and B is 4 bits)

let test = Foo::A

the memory layout is

0000 0000

but

let test = Foo::B

the memory layout is

some 8 bit non 0 value

What exactly is optimized here? Aren't both representation always 8 bits What does it mean when the author claims

It means &, &mut, Box, Rc, Arc, Vec, and several other important types in Rust have no overhead when put in an Option

Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
Jal
  • 2,174
  • 1
  • 18
  • 37

3 Answers3

75

The null pointer optimization basically means that if you have an enum with two variants, where one variant has no associated data, and the other variant has associated data where the bit pattern of all zeros isn't a valid value, then the enum itself will take exactly the same amount of space as that associated value, using the all zeroes bit pattern to indicate that it's the other variant.

In other words, this means that Option<&T> is exactly the same size as &T instead of requiring an extra word.

Lily Ballard
  • 182,031
  • 33
  • 381
  • 347
  • I understand that. But how does the compiler know if all-zeros is an invalid value? I assume the optimization only kicks in for specific built-in types. If so, which ones? – Noel Widmer Dec 14 '19 at 13:22
  • 12
    The compiler has built-in knowledge about the memory layout of various types. For example, it knows that `&`-references can never be null. It also knows that `String` and `Vec` can never be all zeroes; following this down into the implementation, `String` is backed by `Vec`, which is backed by `RawVec`, which is backed by `Unique`, which contains a `*const T` but has a compiler attribute that declares that it can't be null. Similarly there's a stdlib `NonNull` type that acts like a `*mut T` that can never be null. – Lily Ballard Dec 15 '19 at 21:42
  • 8
    To elaborate, empty strings and vectors don't point to null, they point to a fixed nonnull address with zero capacity. Same thing with other containers like HashMaps that can be cheaply created. The Rust stdlib tries very hard to avoid nulls in pointers specifically so the all-zeroes value can be reserved for things like the null pointer optimization. – Lily Ballard Dec 15 '19 at 21:43
21

enum is a tagged union. Without optimization it looks like

Foo::A;    // tag 0x00 data 0xXX
Foo::B(2); // tag 0x01 data 0x02

The null pointer optimization removes the separate tag field.

Foo::A;    // tag+data 0x00
Foo::B(2); // tag+data 0x02
Shepmaster
  • 388,571
  • 95
  • 1,107
  • 1,366
red75prime
  • 3,733
  • 1
  • 16
  • 22
  • 14
    The second example seems to be a little bit inaccurate, taking into account that `0x00` is a valid bit pattern for an integer, thus it's ambiguous meaning both `Foo::A` and `Foo::B(0)`. – mvlabat Oct 31 '19 at 11:06
8

I m also learning too many linked list, perhaps this code snippet can deepen your understanding

pub enum WithNullPtrOptimization{
    A,
    B(String),
}

pub enum WithoutNullPtrOptimization{
    A,
    B(u32),
}

fn main()  {
    println!("{} {}", std::mem::size_of::<WithNullPtrOptimization>(), std::mem::size_of::<String>()); // 24 24
    println!("{} {}", std::mem::size_of::<WithoutNullPtrOptimization>(), std::mem::size_of::<u32>()); // 8 4
}
Steve Lau
  • 658
  • 7
  • 13