1

I want to access the bits in a char individually. There are several questions and answers on this topic here on SO, but they all suggest to use boolean mathematics. However, for my use it would be more convenient if I could simply name the bits separately. So I was thinking of just accessing the char through a bitfield, like so

#include <stdbool.h>
#include <stdio.h>

typedef struct {
    bool _1 : 1, _2 : 1, _3 : 1, _4 : 1, _5 : 1, _6 : 1, _7 : 1, _8 : 1;
} bits;

int main() {
    char c = 0;
    bits *b = (bits *)&c;
    b->_3 = 1;
    printf("%s\n", c & 0x4 ? "true" : "false");
}

This compiles without errors or warnings with gcc -Wall -Wextra -Wpedantic test.c. When running the resulting executable with valgrind it reports no memory faults. The assembly generated for the b->_3 = 1; assignment is or eax, 4 which is sound.

Questions

  • Is this defined behaviour in C?
  • Is this defined behaviour in C++?

N.B.: I'm aware that it might cause trouble for mixed endianness but I only have little endian.

Henri Menke
  • 10,705
  • 1
  • 24
  • 42
  • 3
    What is your use case that `b->_3` is more convenient than `b & 4`? – melpomene Apr 05 '19 at 08:54
  • @melpomene For a small string I want to store flags in the last `char` where the first 4 bits are the remaining size, the 5th bit is a flag for heap allocation and the remaining 3 are unused. Rather than having to think about all the extraction masks it would be easiest to just cast the last bit to `struct { size_t remaining : 4; bool allocated : 1; bool unused1 : 1, unused2 : 1, unused2 : 1; }` and then access `remaining` and `allocated` directly. It also makes writing the bits easier. – Henri Menke Apr 05 '19 at 08:57
  • Endianess and bit ordering are not the same. `size_t remaining : 4` is implementation specific. Portable bitfields are only `int` and `bool` – KamilCuk Apr 05 '19 at 09:04
  • 1
    @KamilCuk "portable bitfields" are a contradiction. They don't exist. – Andrew Henle Apr 05 '19 at 09:22

1 Answers1

4

Is this defined behaviour in C?
Is this defined behaviour in C++?

TL;DR: no it is not.

The boolean bitfield is well-defined as far as: bool is an ok type to use for bit-fields, so you are guaranteed to get a blob of 8 booleans allocated somewhere in memory. If you access boolean _1, you'll get the same value as last time you accessed that variable.

What is not defined is the bit order. The compiler may insert padding bits or padding bytes as it pleases. All of that is implementation-defined and non-portable. So you can't really know where _1 is located in memory or if it is the MSB or LSB. None of that is well-defined.

However, bits *b = (bits *)&c; accessing a char through a struct pointer is a strict aliasing violation and may also cause alignment problems. It is undefined behavior in C and C++ both. You would need to at least show this struct into a union with a char to dodge strict aliasing, but you may still get alignment hiccups (and C++ frowns at type punning through unions).

(And going from boolean type to character type can give some real crazy results too, see _Bool type and strict aliasing)


None of this is convenient at all - bitfields are very poorly defined. It is much better to simply do:

c |= 1u << n;     // set bit n
c &= ~(1u << n);  // clear bit n

This is portable, type generic and endianess-independent.

(Though to dodge change of signedness due to implicit integer promotions, it is good practice to always cast the result of ~ back to the intended type: c &= (uint8_t) ~(1u << n);).

Note that the type char is entirely unsuitable for bitwise arithmetic since it may or may not be signed. Instead you should use unsigned char or preferably uint8_t.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • C++ also has `std::bitset` which works much better than bit-fields, though I don't know enough of how well that one plays out across various compilers to recommend it. It seems to work ok on mainstream PC compilers, but beyond that I don't know. – Lundin Apr 05 '19 at 09:26
  • “accessing a `char` through a struct pointer is a strict aliasing violation” That is exactly the answer I was looking for. Thank you! – Henri Menke Apr 05 '19 at 10:13
  • @HenriMenke But you have numerous other forms of poorly-specified behavior here too. Where the most severe is probably the possibility for misaligned access, depending on CPU. – Lundin Apr 05 '19 at 12:01