6

It is known that C++ bools must be at least 1 byte in size so that pointers can be created for each [https://stackoverflow.com/a/2064565/7154924]. But there are no pointers to primitive types in Java. Yet, they still take up at least 1 byte [https://stackoverflow.com/a/383597/7154924].

Why is this the case - why can't Java booleans be 1 bit in size? Computation time aside, if one has a large boolean array, surely one could conceive a compiler that does the appropriate shifting to retrieve the individual bit corresponding to a boolean value?

flow2k
  • 3,999
  • 40
  • 55
  • Well, the [general answer](https://stackoverflow.com/a/4626993/7366707) to this question should explain why. – Salem Dec 09 '17 at 20:34
  • 2
    @Mango Hmm...but that begs the question, as I allude to in the text: why must a `boolean` be addressable in the first place? Can't 8 `booleans` (in an `boolean` array for instance) share the same address, and then we shift appropriately depending on what the array index is? – flow2k Dec 09 '17 at 20:36
  • 2
    Forcing a packed representation would just slow things down when space isn't an issue. You can pack it yourself or use a library implementation if you need the space. – user2357112 Dec 09 '17 at 20:38
  • 1
    Possible duplicate of [Why is a boolean 1 byte and not 1 bit of size?](https://stackoverflow.com/questions/4626815/why-is-a-boolean-1-byte-and-not-1-bit-of-size) – Oleksandr Pyrohov Dec 09 '17 at 20:39
  • 3
    @flow2k [Yes, they could.](https://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html) – Silvio Mayolo Dec 09 '17 at 20:39
  • @Oleksandr Not necessarily a duplicate, C++ is not Java – Salem Dec 09 '17 at 20:39
  • 1
    @flow2k A boolean array could be optimized ([and in C++](http://en.cppreference.com/w/cpp/container/vector_bool), it has been with `std::vector`, which was arguably a questionable design decision as dereferencing the iterator no longer works), but in Java of course this _could_ work. As to why it wasn't done, I don't know. You could use a [`BitSet`](https://docs.oracle.com/javase/7/docs/api/java/util/BitSet.html) if you need a packed boolean array and/or efficiency. – Salem Dec 09 '17 at 20:39
  • @Oleksandr ...which is not the point of _that specific question_. The _answers_ (not the question) refer to CPU limitations, however they do not necessarily apply equally to this question. The OP of this question even mentions that they are aware of this and instead asks _why_ such an optimization does not exist. – Salem Dec 09 '17 at 20:41
  • 2
    This question asserts they "can't," but I don't know that that's true. Where does it say they can't? If you're asking about why they're not in any existing JVMs (which is likely), then "computation time aside" doesn't make much sense; computation time is the reason to make any optimization, like this one would be. – yshavit Dec 09 '17 at 21:36
  • @Mango: An element of a `vector` is (as you say) no longer a `bool` object, it's a packed bit representing the value of a `bool`. It's a race for two threads to write adjacent elements in a `vector` but not in an array of `bool`. (Not that a packed bitmap is a bad data structure: it's great, but `vector` is probably not a good name for it. See Howard Hinnant's blog: https://isocpp.org/blog/2012/11/on-vectorbool). – Peter Cordes Dec 09 '17 at 21:45

1 Answers1

12

There is no reason why a boolean must be one byte in size. In fact, is likely that booleans already aren't 1 byte in (effective) size in some scenarios: when packed next to other elements larger than 1 byte on the stack or in an object, they are likely to be larger (i.e., adding them to an object may cause the size to grow by more than one byte).

Any JVM is free to implement booleans as 1 bit, but as far as know none choose to do so, probably largely because:

  • Accessing one bit is often more expensive than accessing a byte, particularly when writing.

    To read a bit, a CPU using a "classic RISC" instruction set would often need need an additional and instruction to extract the relevant bit out of a packed byte (or larger word) of boolean bits. Some might even need a additional instruction to load a constant to and. In the case of indexing an array of boolean, where the bit-index isn't fixed at compile-time, you'd need a variable shift. Some CPUs such as x86 have an easier time since they have memory sourcetestinstructions, including specific bit-test instructions taking a variable position such asbt`. Such a CPU probably has similar read performance in both representations.

    Writing is worse: rather than a simple byte write to set a boolean value you now need to read the value, modify the appropriate bit and write it back. Some platforms such as x86 have memory source-and-destination RMW instructions such as and and or that will help, but these are still significantly more expensive than plain writes. In the worst case, repeatedly writing the same element will result in a dependency chain through memory that could slow your code down by an order of magnitude (a series of plain stores can't form a dependency chain).

    Even worse, the write method above is totally thread-unsafe. Two threads working on "independent" booleans might clobber each other, so the runtime would have to use atomic update operations just to write a bit for any field where the object cannot be proven local to the thread.

  • The space savings outside of arrays is usually very small, and is often zero: alignment concerns mean that a single bit will often end up taking the same space as a byte on the stack or in the layout for an object. Only if you had many primitive boolean values on the stack or an object would you see a savings (for example, objects are typically aligned to 8-byte boundaries, so if you have an object whose non-boolean fields are int or larger, you'd need at least 4 boolean values to save any space, and often you'd need 8).

  • This leaves the last remaining "big win" for bit-representation boolean in arrays of boolean, where you could have an asymptotic 8x space savings for large arrays. In fact, this case was motivating enough in the C++ world that vector<bool> there has a "special" implementation where each bool takes one bit - a never ending source of headaches due to all the required special cases and non-intuitive behavior (and often used as an example of a mis-feature that can't be removed now).

    If it weren't for the memory model I could imagine a world where Java implemented arrays of boolean in a bit-wise manner. They don't have the same issues as vector<bool> (mostly because of the extra layer of abtraction provided by the JIT and also because an array provides a simpler interface than vector) and it could be done efficiently, I think. There is that pesky memory model though. That model allows writes to different array elements to be safe if done by different threads (i.e,. they act as independent variables for the purpose of the memory model). All common CPUs support this directly if you implement boolean as a byte, since they have independent byte accesses. No CPUs offer independent bit-access though: you are stuck using atomic operations (x86 offers the lock bt* operations, but these are slow: other platforms have even worse options). That would destroy the performance of any boolean array implemented as a bit-array.

Finally, as described above, implementing boolean as a bit has significant downsides - but what about the upside?

As it turns out, if the user really wants this bit-packed representation of for boolean they can do so themselves! They can pack 8 boolean values into a byte (or 32 values into an int or whatever) in an object (and this is common for flags, etc) and the generated accessor code should be about efficient as efficient as if the JVM natively supported boolean-as-bit. In fact, when you know you want an array-of-bits representation for a large number of booleans, you can simply use BitSet - this has the representation you want and sidesteps the atomic issues by not offering any thread-safety guarantees. So by implementing boolean as a byte, you sidestep all the problems above, but still let the user "opt-in" to bit-level representation if they want, without much runtime penalty.

BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
  • I suppose, “very large” should be “very small”. Also, I’d use the term “atomic update” instead of “atomic operation”, as the latter is much broader and includes the not-so-expensive ordinary reads and writes. – Holger Dec 13 '17 at 10:25
  • @Holger - yes, good catch it should be "very small". I left atomic operation in place as I think it is clear from the context that we are talking about hardware-level atomic/locked operations and not plain reads or writes that happen to be atomic on some platform. This is, I think, conventional terminology in the area. In any case, atomic update doesn't really help, since it writes are updates yet not necessarily expensive, right? Perhaps "atomic RMW" but that's even more obscure and a bit platform specific. – BeeOnRope Dec 13 '17 at 17:33
  • You could argue that it's only an "update" if it depends on the old value. But I agree it's clear from context and "update" doesn't obviously / clearly have that meaning (for all readers), so I think the current wording is fine. – Peter Cordes Dec 13 '17 at 17:36
  • Even an “atomic update” does not have to be extraordinary expensive, if we don’t apply all other semantics of an update visible by other threads, but of course, for such a low level operation, every cycle counts… – Holger Dec 13 '17 at 17:54
  • @Holger - sure, and it depends on the hardware. On x86, for example, any atomic update or useful fence is pretty much as expensive as all the others and they all imply a full fence/sequential consistency. Other platforms have weaker options, but none are much faster for an "atomic RMW" like you'd need here - in fact most won't provide a way to atomically set a bit at all so you are back to using LL/SC or CAS depending on the platform and thesee are generally expensive. None are "extraordinarily expensive" though: 20 - 50 cycles is par for the course. – BeeOnRope Dec 13 '17 at 18:00
  • @BeeOnRope Thanks for this answer - I'm still going through some of the finer details. Toward the beginning, you say "when packed next to other elements larger than 1 byte on the stack or in an object, they are likely to be larger". What causes them to take up more than 1 byte? Some kind of memory alignment constraint? – flow2k Dec 18 '17 at 01:29
  • 2
    @flow2k - correct. The `boolean` itself is still a byte (and accessed as a byte), but because objects are rounded up to the next alignment boundary, adding a `boolean` to an object often has the same effect as adding say an `int`. Java objects, for example, generally have 8-byte alignment on 64-bit platforms, so if you take an object that is exactly 16 bytes and add a single `boolean` it will be 24 bytes. – BeeOnRope Dec 18 '17 at 01:31