30

I have 8 bool variables, and I want to "merge" them into a byte.

Is there an easy/preferred method to do this?

How about the other way around, decoding a byte into 8 separate boolean values?

I come in assuming it's not an unreasonable question, but since I couldn't find relevant documentation via Google, it's probably another one of those "nonono all your intuition is wrong" cases.

phuclv
  • 37,963
  • 15
  • 156
  • 475
xcel
  • 377
  • 1
  • 3
  • 6
  • I am not sure what you mean exactly. A bool(ean) datatype in c++ is one byte, how do you want to convert a byte into byte ? – ScarletAmaranth Dec 11 '11 at 00:56
  • 1
    There is no way to pack 8 bool variables into one byte. There is a way packing 8 logical true/false states in a single byte using Bitmasking. http://en.wikipedia.org/wiki/Bitwise_operation – ScarletAmaranth Dec 11 '11 at 00:58
  • @ScarletAmaranth That should be an answer. – weltraumpirat Dec 11 '11 at 00:59
  • 1
    @weltraumpirat I was not sure what exactly the question was. – ScarletAmaranth Dec 11 '11 at 01:01
  • 3
    I just knew people were going to make this question harder than it is. Well, it's my fault for not knowing booleans are more than 1 bit in size. – xcel Dec 11 '11 at 01:33
  • Related: [How to efficiently convert an 8-bit bitmap to array of 0/1 integers with x86 SIMD](https://stackoverflow.com/q/52098873) is nice for 16 bytes at once. – Peter Cordes Apr 21 '21 at 20:42

8 Answers8

26

The hard way:

unsigned char ToByte(bool b[8])
{
    unsigned char c = 0;
    for (int i=0; i < 8; ++i)
        if (b[i])
            c |= 1 << i;
    return c;
}

And:

void FromByte(unsigned char c, bool b[8])
{
    for (int i=0; i < 8; ++i)
        b[i] = (c & (1<<i)) != 0;
}

Or the cool way:

struct Bits
{
    unsigned b0:1, b1:1, b2:1, b3:1, b4:1, b5:1, b6:1, b7:1;
};
union CBits
{
    Bits bits;
    unsigned char byte;
};

Then you can assign to one member of the union and read from another. But note that the order of the bits in Bits is implementation defined.

Note that reading one union member after writing another is well-defined in ISO C99, and as an extension in several major C++ implementations (including MSVC and GNU-compatible C++ compilers), but is Undefined Behaviour in ISO C++. memcpy or C++20 std::bit_cast are the safe ways to type-pun in portable C++.

(Also, the bit-order of bitfields within a char is implementation defined, as is possible padding between bitfield members.)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
rodrigo
  • 94,151
  • 12
  • 143
  • 190
  • Can you loop on the `Bits` though? – Juicy Sep 24 '15 at 23:09
  • 1
    @Juicy Not with the union directly, if you need to loop you use `<<`. But you can mix both solitions. – rodrigo Sep 24 '15 at 23:13
  • The Bits struct can use `unsigned char` (not unsigned) to better match the CBits union. – Petr Vepřek Jun 17 '16 at 07:32
  • @sp2denny I believe no because this is `unsigned char` which means there's no trap representation. – iBug Jan 01 '18 at 01:32
  • 2
    @ibug I think type-punning through the union is undefined behavior in C++, regehr seems to say so here: https://blog.regehr.org/archives/959 I don't think the question of trap-representations is relevant. It's about the strict aliasing rule. The "cool" way shown above would be against coding standards at my company at least. – Chris Beck Jan 01 '18 at 01:46
  • 1
    @iBug it's UB to access a member of a union that wasn't last assigned to. – UKMonkey Jun 25 '18 at 14:41
  • 7
    Type punning through unions is UB; please remove that or explicitly state that this is an extension and which compilers provide it. – Baum mit Augen Jun 25 '18 at 14:42
  • 3
    This is definitively UB. – YSC Jun 25 '18 at 14:42
  • @UKMonkey How did this question get some attention again? It's been half a year since my comment! – iBug Jun 25 '18 at 15:16
  • @ibug - it was marked as the duplicate for another question- so naturally everyone jumped to take a look see what they thought of it – UKMonkey Jun 25 '18 at 15:18
  • doesnt FromByte create the bit array in reverse order? – Blue Print Mar 07 '22 at 20:13
15

The cool way (using the multiplication technique)

inline uint8_t pack8bools(bool* a)
{
    uint64_t t;
    memcpy(&t, a, sizeof t);         //  strict-aliasing & alignment safe load
    return 0x8040201008040201ULL*t >> 56;
       // bit order: a[0]<<7 | a[1]<<6 | ... | a[7]<<0  on little-endian
       // for a[0] => LSB, use 0x0102040810204080ULL    on little-endian
}

void unpack8bools(uint8_t b, bool* a)
{
       // on little-endian,  a[0] = (b>>7) & 1  like printing order
    auto MAGIC = 0x8040201008040201ULL;  // for opposite order, byte-reverse this
    auto MASK  = 0x8080808080808080ULL;
    uint64_t t = ((MAGIC*b) & MASK) >> 7;
    memcpy(a, &t, sizeof t);    // store 8 bytes without UB
}

Assuming sizeof(bool) == 1

To portably do LSB <-> a[0] (like the pext/pdep version below) instead of using the opposite of host endianness, use htole64(0x0102040810204080ULL) as the magic multiplier in both versions. (htole64 is from BSD / GNU <endian.h>). That arranges the multiplier bytes to match little-endian order for the bool array. htobe64 with the same constant gives the other order, MSB-first like you'd use for printing a number in base 2.

You may want to make sure that the bool array is 8-byte aligned (alignas(8)) for performance, and that the compiler knows this. memcpy is always safe for any alignment, but on ISAs that require alignment, a compiler can only inline memcpy as a single load or store instruction if it knows the pointer is sufficiently aligned. *(uint64_t*)a would promise alignment, but also violate the strict-aliasing rule. Even on ISAs that allow unaligned loads, they can be faster when naturally aligned. But the compiler can still inline memcpy without seeing that guarantee at compile time.


How they work

Suppose we have 8 bools b[0] to b[7] whose least significant bits are named a-h respectively that we want to pack into a single byte. Treating those 8 consecutive bools as one 64-bit word and load them we'll get the bits in reversed order in a little-endian machine. Now we'll do a multiplication (here dots are zero bits)

  |  b7  ||  b6  ||  b4  ||  b4  ||  b3  ||  b2  ||  b1  ||  b0  |
  .......h.......g.......f.......e.......d.......c.......b.......a
× 1000000001000000001000000001000000001000000001000000001000000001
  ────────────────────────────────────────────────────────────────
  ↑......h.↑.....g..↑....f...↑...e....↑..d.....↑.c......↑b.......a
  ↑.....g..↑....f...↑...e....↑..d.....↑.c......↑b.......a
  ↑....f...↑...e....↑..d.....↑.c......↑b.......a
+ ↑...e....↑..d.....↑.c......↑b.......a
  ↑..d.....↑.c......↑b.......a
  ↑.c......↑b.......a
  ↑b.......a
  a       
  ────────────────────────────────────────────────────────────────
= abcdefghxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

The arrows are added so it's easier to see the position of the set bits in the magic number. At this point 8 least significant bits has been put in the top byte, we'll just need to mask the remaining bits out

So the magic number for packing would be 0b1000000001000000001000000001000000001000000001000000001000000001 or 0x8040201008040201. If you're on a big endian machine you'll need to use the magic number 0x0102040810204080 which is calculated in a similar manner

For unpacking we can do a similar multiplication

  |  b7  ||  b6  ||  b4  ||  b4  ||  b3  ||  b2  ||  b1  ||  b0  |
                                                          abcdefgh
× 1000000001000000001000000001000000001000000001000000001000000001
  ────────────────────────────────────────────────────────────────
= h0abcdefgh0abcdefgh0abcdefgh0abcdefgh0abcdefgh0abcdefgh0abcdefgh
& 1000000010000000100000001000000010000000100000001000000010000000
  ────────────────────────────────────────────────────────────────
= h0000000g0000000f0000000e0000000d0000000c0000000b0000000a0000000

After multiplying we have the needed bits at the most significant positions, so we need to mask out irrelevant bits and shift the remaining ones to the least significant positions. The output will be the bytes contain a to h in little endian.


The efficient way

On newer x86 CPUs with BMI2 there are PEXT and PDEP instructions for this purpose. The pack8bools function above can be replaced with

_pext_u64(*((uint64_t*)a), 0x0101010101010101ULL);

And the unpack8bools function can be implemented as

_pdep_u64(b, 0x0101010101010101ULL);

(This maps LSB -> LSB, like a 0x0102040810204080ULL multiplier constant, opposite of 0x8040201008040201ULL. x86 is little-endian: a[0] = (b>>0) & 1; after memcpy.)

Unfortunately those instructions are very slow on AMD before Zen 3 so you may need to compare with the multiplication method above to see which is better

The other fast way is SSE2

x86 SIMD has an operation that takes the high bit of every byte (or float or double) in a vector register, and gives it to you as an integer. The instruction for bytes is pmovmskb. This can of course do 16 bytes at a time with the same number of instructions, so it gets better than the multiply trick if you have lots of this to do.

#include <immintrin.h>

inline uint8_t pack8bools_SSE2(const bool* a)
{
    __m128i v = _mm_loadl_epi64( (const __m128i*)a );  // 8-byte load, despite the pointer type.
    // __m128 v = _mm_cvtsi64_si128( uint64 );  // alternative if you already have an 8-byte integer
    v = _mm_slli_epi32(v, 7);   // low bit of each byte becomes the highest
    return _mm_movemask_epi8(v);
}

There isn't a single instruction to unpack until AVX-512, which has mask-to-vector instructions. It is doable with SIMD, but likely not as efficiently as the multiply trick. See Convert 16 bits mask to 16 bytes mask and more generally is there an inverse instruction to the movemask instruction in intel avx2? for unpacking bitmaps to other element sizes.

How to efficiently convert an 8-bit bitmap to array of 0/1 integers with x86 SIMD has some answers specifically for 8-bits -> 8-bytes, but if you can't do 16 bits at a time for that direction, the multiply trick is probably better, and pext certainly is (except on CPUs where it's disastrously slow, like AMD before Zen 3).

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
phuclv
  • 37,963
  • 15
  • 156
  • 475
  • I upvoted this because it's cute and well-illustrated, but most people who come across this question would be better off with a `std::bitset`. They probably even get the optimization for free if they use the `unsigned long long` constructor. – Baryons for Breakfast Jun 11 '19 at 08:23
  • @BaryonsforBreakfast no, you won't get the optimization for free because each bit in the bitset is still stored as one bit and not an 8-bit byte. You do have the ability to get each bit and do operations on it though, but not in parallel so it'll be slower. The 8 bytes -> 8 bits direction also doesn't work automatically with bitset – phuclv Jun 11 '19 at 08:34
  • Sorry, I meant that the packing and unpacking CPU instructions under "the efficient way" are probably free when they're available if you use the `unsigned long long` constructor (and `to_ullong` to unpack). Going between the array of `bool`s and the `unsigned long long`s would still not be free in that you'd still have to do the casts by hand. – Baryons for Breakfast Jun 11 '19 at 09:09
  • 1
    `to_ullong` and the `unsigned long long` constructor can't use PEXT and PDEP because they just copy the `unsigned long long` value directly to/from the internal array. The bits in bitset are stored as bits in the array and not one bit per byte. So `to_ullong` gives you 64 bits and not 8 – phuclv Jun 11 '19 at 10:31
  • Has anyone run into incorrect answers with `gcc` 8 and 9 when `-O2` is enabled? I get wrong answers after a packing and unpacking when `-O2` is used, while it gives correct answers without optimization. Note, using `clang`, it gives correct answers even when `-O2` is on. – Samuel Li Mar 11 '20 at 06:07
  • Add another comment to address my previous comment regarding incorrect answers with `gcc` when `-O2` is turned on: I found that using `std::memcpy()` instead of type punning would resolve the incorrect answer issue. – Samuel Li Mar 11 '20 at 06:19
  • 1
    @SamuelLi that's probably due to [strict aliasing](https://stackoverflow.com/q/98650/995714). Try `-fno-strict-aliasing` or a union. Or just change from `bool` to `char` because `char*` can alias other types – phuclv Mar 11 '20 at 16:20
  • @phuclv using `char` instead of `bool` would give me correct answers. But gcc still complains `dereferencing type-punned pointer will break strict-aliasing rules`. This is interesting behavior of gcc though. – Samuel Li Mar 11 '20 at 19:26
  • 1
    @SamuelLi: That's because using `uint64_t*` to read a `char[]` object violates strict-aliasing. ISO C only allows it the other way around: using `char *` to read an object that was originally a `uint64_t`. If the only access to the char object is through a `char*`, you're fine, but if there's any use of an automatic or static storage `char[]`, you have UB. This answer should really use `memcpy` or C++20 `std::bitcast`, or GNU C `typedef uint64_t unaligned_aliasing_u64 __attribute__((aligned(1), may_alias));` to do the type-punning; pointer-casting is never safe. – Peter Cordes Oct 29 '20 at 03:24
  • @PeterCordes Thanks for your explanation, but I don't quite get what you said about automatic or static storage. Could you elaborate what it means to be automatic or static storage? – Samuel Li Nov 01 '20 at 04:43
  • 1
    @SamuelLi: In C++, automatic storage = non-static locals; static storage = globals and static-anything. My point was that if there's some way to access the array object itself, especially as part of a struct, then it's possible to have a real problem with strict-aliasing. If the "array" is just how you're treating some memory you got from `new` or `malloc`, all the `char` accesses to it are going to be via `char*` anyway, and thus aliasing-safe. But if you have a real `char array[8]` variable declaration somewhere, access to it might not be through a `char*`. – Peter Cordes Nov 01 '20 at 04:50
  • 1
    @SamuelLi: Although I just realized you can have a problem even with dynamic storage. If you have `struct foo{ int a; char b[8];};` then you can do struct assignment of the whole struct even via pointers to it, like `foo *p=..., *q=...;` and `*p = *q`. Access to `char b[8]` is *not* through a `char*` in that case, so a compiler could certainly assume that some `uint64_t*` load or store was unrelated to it. (You might actually need this struct thing for UB to actually happen, or at least to be a problem in practice, not just a `char arr[8]` local var: `[]` access works by decay to char*) – Peter Cordes Nov 01 '20 at 04:55
  • 1
    @PeterCordes I got it, thanks! I'm convinced that `memcpy` is the safest way to go! – Samuel Li Nov 01 '20 at 05:37
  • If you want the low bit of the mask to map to the least significant byte of the uint64_t (the low byte of the bool array on little-endian x86) you need MAGIC = 0x0102040810204080ULL, opposite what you're showing. See https://godbolt.org/z/j1nGM4TPY which compares test cases for that vs. `pdep` vs. [How to efficiently convert an 8-bit bitmap to array of 0/1 integers with x86 SIMD](https://stackoverflow.com/a/52105856). (For my answer on [Convert 16 bits mask to 16 bytes mask](https://stackoverflow.com/a/67203617)) – Peter Cordes Apr 21 '21 at 23:18
  • (The order you have here will give you printing order, MSB at lowest address, on a little-endian machine. **Opposite of your `pdep` version.** I found the easiest way to think through which order is which is: `0x...40201 * 0x80` has 0x80 in the low byte, which the mask+shift will turn into `0x...01`. So high bit of the input maps to low byte of the output, and memcpy into bool[] uses native endianness.) – Peter Cordes May 14 '21 at 15:28
  • https://godbolt.org/z/ba8PK99d1 is a test-case for byte-order, giving it `bool a[] = {[7] = 1;}` highest bit set as a test input -> 1 or 0x80. I made an edit based on sorting all that out, hopefully making this more usable for future readers. Note that clang on Godbolt uses host headers, so big-endian targets like powerpc64 still use x86 bits/endianness.h and have backwards `htole64` / `be64`. (https://github.com/compiler-explorer/compiler-explorer/issues/1469#issuecomment-841416887). But GCC works, and does sometimes constant-propagate through a bool array init -> memcpy -> multiply. – Peter Cordes May 14 '21 at 21:00
  • The magic number for the LSB->LSB mapping doesn't seem to be correct. The mask 0x80402010 (or a mask 0x01010101...) can work to spread the original byte apart, because the separation of each bit in the mask is at least 8. But in the other multiplier 0x010204....80ull all the bits have a distance of 7 meaning that the input octets overlap. Out of the 256 combinations 64 fail. One cure is to `(a & 0x7f) * magic | (a << 56)`. – Aki Suihkonen Oct 27 '21 at 10:10
  • @AkiSuihkonen the magic number is correct, you can check the math I posted above. I've also done a [correctness check](https://stackoverflow.com/a/23875535/995714) previously and it works for all 256 combinations. You can also see my edit to see an old implementation with overlap and I had to split out a bit. After I changed to the latest implementation there's zero overlap – phuclv Oct 28 '21 at 02:18
  • @phuclv: See https://godbolt.org/z/eMfaso3n7, the source copied from the snipped by Peter Cordes. The problem appears in the LSB->LSB conversion, iff both the LSB and MSB of a byte are set. And thus the magic number can't be simply reversed for big endian architectures. I'm quite sure, the code works for LSB->MSB conversion. – Aki Suihkonen Oct 28 '21 at 07:16
  • @AkiSuihkonen the magic number 0x0102040810204080ULL is added by Peter Cordes. My number works for LSB->MSB conversion and you have to reverse endian to make it work for the same direction. I'll edit this when I have time – phuclv Oct 28 '21 at 15:49
12

You might want to look into std::bitset. It allows you to compactly store booleans as bits, with all of the operators you would expect.

No point fooling around with bit-flipping and whatnot when you can abstract away.

Baum mit Augen
  • 49,044
  • 25
  • 144
  • 182
6
#include <stdint.h>   // to get the uint8_t type

uint8_t GetByteFromBools(const bool eightBools[8])
{
   uint8_t ret = 0;
   for (int i=0; i<8; i++) if (eightBools[i] == true) ret |= (1<<i);
   return ret;
}

void DecodeByteIntoEightBools(uint8_t theByte, bool eightBools[8])
{
   for (int i=0; i<8; i++) eightBools[i] = ((theByte & (1<<i)) != 0);
}
Jeremy Friesner
  • 70,199
  • 15
  • 131
  • 234
  • 9
    Posting a code solution without any explanation might help OP, but does not provide good value to other users. You should consider adding comments and/or explain what you did. – weltraumpirat Dec 11 '11 at 01:02
  • +1 for using uint8_t. Exactly what the type was meant for, when you need exactly 8 bits. –  Dec 11 '11 at 01:02
  • 1
    I hope you realize that `eightBools[i]` is a `bool` and checking it with `== true` you can also just write `(eightBools[i] == true) == true` or `((eightBools[i] == true) == true) == true`, but where to stop? And yes, that's worth not up-voting the answer. – Christian Rau Dec 11 '11 at 01:06
  • weltraumpirat I think the code is straightforward enough that a separate explanation would be redundant and only get in the way. Christian I accept your criticism -- I wouldn't write it like that in my own code -- but I wrote it that way here to make the intent of the code clearer. You may keep your up-vote, I don't want it ;) – Jeremy Friesner Dec 11 '11 at 01:16
  • @JeremyFriesner You can have it anyway, since you say it's not your practice. Sorry for the harsh words ;) – Christian Rau Dec 11 '11 at 01:19
  • 4
    @JeremyFriesner For someone who is new to bitmasks, your code isn't straightforward, not one bit (pun intended). I stand by my earlier comment. – weltraumpirat Dec 11 '11 at 10:39
2

I'd like to note that type punning through unions is UB in C++ (as rodrigo does in his answer. The safest way to do that is memcpy()

struct Bits
{
    unsigned b0:1, b1:1, b2:1, b3:1, b4:1, b5:1, b6:1, b7:1;
};

unsigned char toByte(Bits b){
    unsigned char ret;
    memcpy(&ret, &b, 1);
    return ret;
}

As others have said, the compiler is smart enough to optimize out memcpy().

BTW, this is the way that Boost does type punning.

iBug
  • 35,554
  • 7
  • 89
  • 134
  • @Michi Is this question C++? – iBug Jan 07 '19 at 16:34
  • 1
    Though I like the simplicity of this solution, but it looks like the memory layout of bit fields are compiler dependent [link](https://stackoverflow.com/questions/15136426/memory-layout-of-struct-having-bitfields). That is undesired for some use cases... – Samuel Li Mar 05 '20 at 05:31
2
bool a,b,c,d,e,f,g,h;
//do stuff
char y= a<<7 | b<<6 | c<<5 | d<<4 | e <<3 | f<<2 | g<<1 | h;//merge

although you are probably better off using a bitset

http://www.cplusplus.com/reference/stl/bitset/bitset/

111111
  • 15,686
  • 6
  • 47
  • 62
1

There is no way to pack 8 bool variables into one byte. There is a way packing 8 logical true/false states in a single byte using Bitmasking.

weltraumpirat
  • 22,544
  • 5
  • 40
  • 54
ScarletAmaranth
  • 5,065
  • 2
  • 23
  • 34
  • You can pack the *values* of 8 `bool`s into one byte. Of course they're no longer separate C++ `bool` objects, but all 8 values are all there within the bits of one `char` or `uint8_t`. This answer seems unhelpful and pedantic when the desired behaviour was clear. – Peter Cordes Oct 29 '20 at 03:34
  • I couldn't read minds as to determine what the intent was, given my crystal ball was broken at the time of answering. I answered the question asked. – ScarletAmaranth Nov 04 '20 at 00:23
0

You would use the bitwise shift operation and casting to archive it. a function could work like this:

unsigned char toByte(bool *bools)
{
    unsigned char byte = \0;
    for(int i = 0; i < 8; ++i) byte |= ((unsigned char) bools[i]) << i;
    return byte;
}

Thanks Christian Rau for the correction s!

Community
  • 1
  • 1
Fabián Heredia Montiel
  • 1,687
  • 1
  • 16
  • 30