How to properly represent arrays of non full byte numbers in c++?

Question

I wanted to implement the FAT12 specification in C++, where the FAT is an array of 12-Bit numbers.

Because types can only have full byte size I tried to use bitfields in structs to have a pair of two 12-Bit numbers which would fill 3 bytes:

struct TwoEntries {
    uint16_t first : 12;
    uint16_t second : 12;
};

But this struct has a size of four bytes because of padding explained in this question, and with the padding an array wouldn't properly fit the data.

So my question would be:
Is there some way to properly declare an array of 12-Bit numbers?

*"Is there some way to properly declare an array of 12-Bit numbers?"* Not really. I would use a struct of 3 bytes, and then add member functions to extract the first and second entry "manually" using bit shifts. — BoP, Jun 20 '23 at 17:44
The language standard doesn't even say if bit fields will straddle boundaries, so we don't know if `second` will use the 4 bits left over from `first`, or start over in a new storage unit, — BoP, Jun 20 '23 at 17:46
[You could use packing](https://godbolt.org/z/3Y6M37dYM). But, honestly, don't bother. Just use `uint16_t`. — LHLaurini, Jun 20 '23 at 18:57
I would use a class containing a ptr to the FAT32 char array and possibly it's length. Then add methods to access the appropriate nibble and byte constituting a 12 bit entry. — doug, Jun 20 '23 at 19:30
Note : bitfields may still align in memory as the compiler sees fit (implementation defined). The only thing C++ guarantees it would do operations on the bitfield as if it where a value with the specified number of bits. I would probably make a datastructure myself and use bitmasking and shifting to get individual values out. — Pepijn Kramer, Jun 20 '23 at 19:39
I would never use bitfields to map onto an external data type. These are notoriously non-portable. — Paul Sanders, Jun 20 '23 at 20:02
"Wasting" 4 bits for something in memory only matters if you have so many records that you might run out. Even 100,000 records is only 50K wasted if you use an unsigned int16. — Dave S, Jun 20 '23 at 20:36
`struct TwoEntries { uint8_t data[3]; uint16_t first() const { return data[0] + (data[1] & 0xF) << 8); } uint16_t second() const { return (data[1] >> 4) + (data[2] << 4); }` Adjust the bit twiddling to fit your data schema as necessary. — Eljay, Jun 20 '23 at 20:44
[std::bitset](https://en.cppreference.com/w/cpp/utility/bitset) — Ripi2, Jun 21 '23 at 19:01

user1806566 · Answer 1 · 2023-06-21T18:41:27.210

Creating an array of packed bytes is easy - you can just use an array (or vector) of uint8_t. What's tricky is treating 12 bits inside that array as a 12-bit int, since there is no C++ type for "12 bit integer".

We can, however, create a proxy type which approximates a reference to a 12-bit integer:

class TwelveBitInt {
public:
   // Our 12-bit int starts at byte offset "byte", bit offset "bit"
   TwelveBitInt( uint8_t* byte, int bit ) : ptr{byte}, start_position{bit} {}

   operator uint16_t() const {
      // return bit manipulation to extract your 12-bit number
   }

   TwelveBitInt& operator=( uint16_t new_value ) {
      // use bit manipulation to assign new_value to the underlying array
   }

private:
   uint8_t* ptr;
   int start_position;
};

This gives you a type that looks like a 12-bit integer as long as you don't look too closely. It's implicitly convertible to uint16_t, and it's assignable from uint16_t, which is good enough for most uses. As for whether it's portable, it depends on what assumptions you make in your bit manipulations, but that's up to you.

I would then write a container class for your array. For simplicity, I'll assume that the size of the array is known at construction time.

class Fat12Array {
public:
   Fat12Array( std::size_t n_elems )
     : bytes( (n_elems * 3 + 1) / 2, 0 ) {}

   TwelveBitInt operator[]( std::size_t idx ) {
      // Interpret underlying bytes as an array of 3-byte/2-elem

      // Get address of the 3-byte structure
      auto byte_ptr = bytes.data() + 3*(idx/2);

      if( idx % 2 ) {
          return TwelveBitInt{ byte_ptr + 1, 4 };
      } else {
          return TwelveBitInt{ byte_ptr, 0 };
      }
   }

private:
   std::vector<uint8_t> bytes;
};

Depending on how fancy you want to make this, you could deal with const TwelveBitInts, add more methods to the container, perhaps iterators, etc., but this is the basic idea.

I should note that is a similar idea to the other answer. The difference is that it buries the concept of "TwoEntries", so it doesn't make its way into the client interface. — user1806566, Jun 21 '23 at 15:58
I would probably avoid the mutable `TwelveBitInt operator[]` and simply have `void set(std::size_t idx, unsigned short value)` and `unsigned short value get(std::size_t idx)`. That's simpler, and probably runs faster. — Mooing Duck, Jun 21 '23 at 16:04
The compiler should be able to inline everything that TwelveBitInt does, so the performance should be the same either way. In this skeleton of an answer, I agree that get/set would be fine. If you extended this to add iterators to Fat12Array, however, the iterators would need a reference type, which is where you'd need something like TwelveBitInt. — user1806566, Jun 21 '23 at 16:15
I'm not convinced a compiler would notice that `TwelveBitInt::operator=` and `Fat12Array` were effectively duplicating the same equations. http://coliru.stacked-crooked.com/a/ae4414e9ff988d62 But you're definitely right about iterators — Mooing Duck, Jun 21 '23 at 16:17

score 1 · Accepted Answer · answered Jun 21 '23 at 00:31

Technically there is a way, but it's not portable:

#include <cstdint>

struct [[gnu::packed]] TwoEntries {
    std::uint16_t first : 12;
    std::uint16_t second : 12;
};

static_assert(sizeof(TwoEntries) == 3); // assertion passes

The size (in bytes) of bit-field members, the padding between them, and other properties are totally implementation-defined, so they make for a horrible tool when working with something like a file system where you must have a layout that is the same for all compilers.

Instead, consider creating a class which has a layout that you have total control over:

struct TwoEntries {
    std::uint8_t data[3];

    std::uint16_t get_first() const {
        return data[0] | ((data[1] & 0xf) << 8);
    }

    std::uint16_t get_second() const {
        return ((data[1] >> 4) & 0x0f) | (data[2] << 4);
    }

    void set_first(std::uint16_t x) {
        data[0] = x & 0xff;
        data[1] = (data[1] & 0xf0) | ((x >> 8) & 0x0f);
    }

    void set_second(std::uint16_t x) {
        data[1] = ((x & 0x0f) << 4) | (data[1] & 0xf);
        data[2] = (x >> 4) & 0xff;
    }
};

As you can see, this method is quite a bit more effort compared to using a bit-field, but we have total control over the memory layout, and our solution is portable across different compilers.

If you run into this pattern a lot, it might make sense to create a class template such as:

template <std::size_t BitWidth, std::size_t BitOffset>
struct uint_bitref {
    void* to;
    uint_bitref(void* to) : to{to} {}
    /* ... */
};

// and then implement TwoEntries by returning this reference
// which we can use to read and write an integer at a certain bit offset

struct TwoEntries {
    using first_t = uint_bitref<12, 0>;
    using second_t = uint_bitref<12, 4>;

    std::uint8_t data[3];
    
    first_t first() {
        return data;
    }

    second_t get_second() {
        return data + 1;
    }
};

How to properly represent arrays of non full byte numbers in c++?

2 Answers2