7

Running this code in rust:

fn main() {
    println!("{:?}", std::mem::size_of::<[u8; 1024]>());
    println!("{:?}", std::mem::size_of::<[bool; 1024]>());
}

1024

1024

This is not what I expected. So I compiled and ran in release mode. But I got the same answer.

Why does the rust compiler seemingly allocate a whole byte for each single boolean? To me it seems to be a simple optimization to only allocate 128 bytes instead. This project implies I'm not the first to think this.

Is this a case of compilers being way harder than the seem? Or is this not optimized because it isn't a realistic scenario? Or am I not understanding something here?

andy boot
  • 11,355
  • 3
  • 53
  • 66
  • 4
    I don't have enough time to properly answer this, but consider the question of what would happen if you wanted to get a pointer or reference to an element in your boolean array. – fjh Feb 19 '18 at 22:47
  • 4
    Alignment. Use [bitflags](https://crates.io/crates/bitflags) instead. – Henri Menke Feb 19 '18 at 22:50
  • Right, we can't point to anything smaller than a byte. And we don't want the complexity of trying to cram 8 booleans into a byte. If you really need that kind of control use a library like bitflags. – andy boot Feb 19 '18 at 23:26
  • 3
    C++ famously made that design mistake for `vector` – CodesInChaos Feb 20 '18 at 17:11
  • 1
    [C++ version of this question, for reference](https://stackoverflow.com/questions/17794569/why-is-vectorbool-not-a-stl-container). I wonder though why this question is being downvoted to the depths. – rubdos Feb 21 '18 at 09:33

1 Answers1

8

Pointers and references.

  1. There is an assumption that you can always take a reference to an item of a slice, a field of a struct, etc...
  2. There is an assumption in the language that any reference to an instance of a statically sized type can transmuted to a type-erased pointer *mut ().

Those two assumptions together mean that:

  • due to (2), it is not possible to create a "bit-reference" which would allow sub-byte addressing,
  • due to (1), it is not possible not to have references.

This essentially means that any type must have a minimum alignment of one byte.


Note that this is not necessarily an issue. Opting in to a 128 bytes representation should be done cautiously, as it implies trading off speed (and convenience) for memory. It's not a pure win.

Prior art (in the name of std::vector<bool> in C++) is widely considered a mistake in hindsight.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • 1
    Another assumption Rust makes is that two adjacent items in a slice can be modified from different thread without causing a race condition. Which would require use of atomic instructions whenever you access such a bit-packed element. – CodesInChaos Feb 20 '18 at 17:13
  • "unless this element is zero-sized" [Really?](http://play.integer32.com/?gist=81232bb27617cee4f8f08544039a12ee&version=stable). You definitely can take the address of a 0 sized-type. Although, the address would not be unique, eg. in the case of an array, all elements would have the same address. – mcarton Feb 20 '18 at 18:20
  • The argument that a packed bit vector would not be able to support references and slices with the same semantics as other types is wrong, because https://crates.io/crates/bitvec does precisely that. – Daira Hopwood Jul 23 '21 at 14:47
  • Also it's unclear that the packed representation would be any slower, since it requires less memory bandwidth and cache usage. – Daira Hopwood Jul 23 '21 at 14:55
  • @DairaHopwood: With regard to the API, `bitvec` as far as I can see does not return a `&Bit`, it instead uses a `BitRef` type, specifically because it is not possible to form references or pointers to non-byte addressable content. With regard to the performance, I agree. It will depend on the access patterns. – Matthieu M. Jul 23 '21 at 16:40
  • Right, but `BitRef` behaves in every way like a `&bool` hypothetically would, so that's not a reason why Rust couldn't have supported `&bool` for packed bit arrays at the language level. That is, it's not a matter of impossibility, as your answer said. It's more a matter of what the language designers found important. – Daira Hopwood Jul 24 '21 at 09:07
  • @DairaHopwood: I disagree that it behaves in every way like a `&bool`, and I maintain it's impossible for it to (without giving up something else). You cannot pass `BitRef` where `&T` is expected, you cannot convert a `BitRef` to `*const T`. And this is not JUST because the designers didn't want to. Doing so would break parametricity for generic code as you cannot pass `*const T` (where `T` is a bit) as `*const void_t` to a C function because C doesn't have sub-byte addressing. There is therefore a trade-off there. – Matthieu M. Jul 30 '21 at 12:33
  • "You cannot pass ``BitRef`` where ``&T`` is expected, you cannot convert a ``BitRef`` to ``*const T``." The former is easy, if there were an official ``&bool`` with implementation similar to ``BitRef``. The latter is more tricky, but on the architectures where Rust is supported, you actually could stuff 3 bits for the bit-within-byte index into unused bits of the pointer. You couldn't validly dereference it in C without helper functions, but I think that's okay. – Daira Hopwood Dec 03 '21 at 19:18