0

I am taking binary input from a file to a buffer vector then casting the pointer of that buffer to be my struct type.

The goal is for the data to populate the struct perfectly.

I know the size of all the various fields and the order they're going to come in.

As a result my struct needs to be tightly packed and be 42 bytes long. My issue is that it is coming out at 44 bytes long when I test it.

Also, the first value lines up. After that, the data is incorrect.

Here's the struct:

#pragma pack(push, 1)
struct myStruct
{
    uint8_t ID;
    uint32_t size: 24;
    uint16_t value;
    char name[12];
    char description[4];
    char shoppingList[14];
    char otherValue[6];
};
#pragma pack(pop)
MaxA
  • 15
  • 4
  • 1
    `char* name[12]` is an array of pointers, is that what you really wanted? – NathanOliver Nov 04 '21 at 16:48
  • @NathanOliver No, sorry! Copy-pasting error. Was messing around with them being pointers. Ignore. – MaxA Nov 04 '21 at 16:50
  • 3
    It looks like 43 to me :) – Vlad Feinstein Nov 04 '21 at 16:51
  • 3
    `uint32_t size: 24;` is still going to take up 32 bits in size, even if you only use 24 of the bits. It has to, because it can't share space with `value` and there is no 24 bit wide data type (on modern machines at least) – NathanOliver Nov 04 '21 at 16:53
  • 1
    Cannot reproduce [godbolt](https://godbolt.org/z/YPersKboo). @NathanOliver But it might be able to share space with `ID`. – Quimby Nov 04 '21 at 16:54
  • @NathanOliver I have just tried using int size : 24 instead and am getting the same size (43 as Vlad has pointed out). How would I take input for a three byte value? – MaxA Nov 04 '21 at 16:56
  • 1
    With gcc it is indeed 42 bytes. [ideone](https://ideone.com/pT1Gdr) – simondvt Nov 04 '21 at 16:56
  • In addition the bitfield issue mentioned by @NathanOliver, you'r using a non-standard compiler feature `#pragma pack`. You have to make sure that the compiler you're using support this. An alternative solution is to use a structure containing a byte array, that way you're immune to issues like alignment requirements, padding, and endianess. – Lindydancer Nov 04 '21 at 16:59
  • 1
    @MaxA You should be able to use `uint32_t ID : 8; uint32_t size: 24;` to get it to optimize correctly. – NathanOliver Nov 04 '21 at 17:00
  • 2
    @Lindydancer -- in addition to making sure that the compiler **supports** `#pragma pack`, it's necessary to **understand** what the compiler does when it sees that pragma. Its effect is, after all, implementation specific. – Pete Becker Nov 04 '21 at 17:02
  • The only **portable** use of binary files is that you can write out the contents of a data object to a binary file and read it into a data object of the same type with code that was compiled by the same compiler. (And "the same compiler" means exactly the same options, too) – Pete Becker Nov 04 '21 at 17:05
  • @NathanOliver After removing the #pragma command and including JUST the values ID, Size and value, it should be 6 bytes long. When I call size on the struct it returns 8 (+2 bytes from what I expected). The same is mirrored with the actual struct. When I test the size of it I get 44 (+2 bytes from what I expected). Are these bytes added by the compiler? – MaxA Nov 04 '21 at 17:12
  • `6` is not a power of 2, and most if not all objects by default have a size that is divisible by the word size (typically 4 or 8 bytes on modern machines). For why this is, see: https://stackoverflow.com/questions/58435348/what-is-bit-padding-or-padding-bits-exactly/58436082#58436082 – NathanOliver Nov 04 '21 at 17:15
  • *How would I take input for a three byte value?* `std::byte size[3]; uint32_t get_size() const { return to_uint32(size[0]) + to_uint32(size[1]) << 8 + to_uint32(size[2]) << 16; }` assuming little endian, and provided you write a `to_uint32` routine. – Eljay Nov 04 '21 at 17:46

1 Answers1

0

Also, the first value lines up. After that, the data is incorrect.

uint32_t size: 24;

If you want to guarantee portably that this is three bytes with no padding before the next member, you're going to need to use a byte buffer and do the conversions yourself.

#pragma pack is an extension, and the packing of bitfield members is anyway implementation-defined.

FWIW both GCC and CLANG do seem to do what you want in this case, but unless it's defined by a platform ABI depending on this is still brittle.

Useless
  • 64,155
  • 6
  • 88
  • 132
  • Can it not share space with `ID` member? I know nothing about the rules for bitfields. – Quimby Nov 04 '21 at 16:57
  • 1
    Consecutive bitfield members can be packed together. The compiler is not _required_ to squish bitfield and non-bitfield members together, whatever pragmas you use. – Useless Nov 04 '21 at 16:59
  • 1
    @Quimby -- maybe. But pretty much everything about the effect of bitfield declarations is implementation defined. – Pete Becker Nov 04 '21 at 16:59
  • @PeteBecker Okay, thanks both of you. Apparently both [clang and gcc](https://godbolt.org/z/YPersKboo) can pack the struct to 42 bytes. – Quimby Nov 04 '21 at 17:01
  • You might get the compiler to join the two fields if you use the same base type, as in `uint32_t ID:8; uint32_t size:24`. However, if you plan to make the code portable you have to include some kind of selftest to trigger when this fails. – Lindydancer Nov 04 '21 at 17:01
  • 1
    IME (of having lots of `static_assert`s to confirm structs match various protocols) the only standard & portable way really is to have a byte buffer and generate the code to `memcpy` members out to properly-aligned variables. Everything else gets hairy one way or another. – Useless Nov 04 '21 at 17:03
  • 1
    @Useless -- not even `memcpy` will help you in case the input file use a different endianess than your machine. A better solution is to read the values byte by byte and use "shift" and "or" (and casts) to create machine-compatible values. – Lindydancer Nov 04 '21 at 17:06
  • I know that's portable, simple _and_ correct - but (again, IME) `memcpy` + byte-swap (`__builtin_bswapX` etc.) worked out faster. It doesn't always matter, of course, and benchmarks rise and fall. – Useless Nov 04 '21 at 17:23