Packing unions/structure to avoid padding

Question

I have a structure that looks like this:

struct vdata {
  static_assert(sizeof(uint8_t *) == 8L, "size of pointer must be 8");
  union union_data {
    uint8_t * A; // 8 bytes
    uint8_t B[12]; // 12 bytes
  } u;
  int16_t C; // 2 bytes
  int16_t D; // 2 bytes
};

I would like to make this 16 bytes, but GCC is telling me it is 24, as the union is padding to 16 bytes.

I would like to put vdata into a large std::vector. From my understanding, there should be no issue with alignment if this were 16 bytes, since the pointer would always be 8 byte aligned.

I understand that I can force this to be packed using __attribute__((__packed__)) in GCC. But I would like to know if there is a portable and standard compliant way to get this to be 16 bytes?

Edit: Ideas

Idea 1: split up the B array.

struct vdata {
  union union_data {
    uint8_t * A; // 8 bytes
    uint8_t B[8]; // 8 bytes
  } u;
  uint8_t B2[4]; // 4 bytes
  int16_t C; // 2 bytes
  int16_t D; // 2 bytes
};

Could B2 elements be reliably accessed from a pointer of B? Is that defined behavior?

Idea 2: store pointer as byte array and memcpy as necessary (@Eljay)

struct vdata {
  union union_data {
    std::byte A[sizeof(uint8_t*)]; // 8 bytes
    uint8_t B[12]; // 12 bytes
  } u;
  int16_t C; // 2 bytes
  int16_t D; // 2 bytes
};

Would there be a performance penalty for accessing the pointer, or would it be optimized out? (Assuming GCC x86).

Short answer, No. The standard allows no control for how objects are padded. — NathanOliver, Aug 11 '21 at 19:07
@VictorEijkhout It's a union. Its size will be that of the largest member, plus possible padsding. — NathanOliver, Aug 11 '21 at 19:11
Welcome to the big umbrella of C++. On many platforms leaving out alignment packing would make the structure unusable, and it it's unusable, what would be the point? — user4581301, Aug 11 '21 at 19:16
You could change A to `std::byte A[sizeof(uint8_t*)];` and then `memcpy` the pointer into A and out of A. — Eljay, Aug 11 '21 at 19:20
You could write a method or function that copies the members to a `uint8_t` buffer. This way ensures that the data in the buffer is packed. — Thomas Matthews, Aug 11 '21 at 19:32
@Eljay I verified that a memcpy using std::byte produces the same assembly code as using a pointer directly. So if you want to write up that solution, I'll accept. — thc, Aug 11 '21 at 20:29
Not that this might only apply to your current target system and compiler. On another system the compiler may be forced to generate code that actually does something. But if it does, you'll be glad that it did because the alternative gets weird. — user4581301, Aug 11 '21 at 20:43
An alternative would be to have an union of 2 `struct`s say ACD and BCD. As far as the standard is concerned, there are no portable way to have a specific layout. — Phil1970, Aug 11 '21 at 20:56
@Phil1970 That's an interesting idea. Are there any type punning issues? First bit of C will tell me what state `u` is in, but that will now be in the union. — thc, Aug 12 '21 at 00:32
@Phil1970 CDA and CDB makes more sense - so called structure with common beginning, C and D will not be punned. And then just one can ditch CDA and add a getter function which would launder a pointer value from array B — Swift - Friday Pie, Aug 12 '21 at 15:43
@Swift-FridayPie I have made an answer mentionning that. When I wrote previous comment, it was not known how OP knew the active member... — Phil1970, Aug 12 '21 at 15:47

score 2 · Accepted Answer · answered Aug 11 '21 at 22:38

You could change A to std::byte A[sizeof(uint8_t*)]; and then std::memcpy the pointer into A and out of A.

Worth commenting as to what is going on, and that these extra hoops are to avoid padding bytes.

Also adding a set_A setter and get_A getter may be very helpful.

struct vdata {
  union union_data {
    std::byte A[sizeof(uint8_t*)]; // 8 bytes
    uint8_t B[12]; // 12 bytes
  } u;
  int16_t C; // 2 bytes
  int16_t D; // 2 bytes

  void set_A(uint8_t* p) {
    std::memcpy(u.A, &p, sizeof p); 
  }
  uint8_t* get_A() {
    uint8_t* result;
    std::memcpy(&result, u.A, sizeof result);
    return result;
  }
};

Nikos Athanasiou · Answer 2 · 2021-08-12T08:37:08.643

1

Store C+D in the union's array, and provide method access to them:

struct vdata {
  static_assert(sizeof(uint8_t *) == 8L, "size of pointer must be 8");
 
   union union_data {
    uint8_t * A;   // 8 bytes
    uint8_t B[16]; // 12 + 2*2 bytes
  } u;
  
  int16_t& C() { 
    return *reinterpret_cast<int16_t*>(static_cast<void*>(&u.B[12])); 
  }
  int16_t& D() { 
    return *reinterpret_cast<int16_t*>(static_cast<void*>(&u.B[14])); 
  }
};

Demo (with zero warnings for strict aliasing violations and run-time address sanitization enabled)

Keep in mind that there's no strict aliasing violation when the buffer is char* i.e. single byte type like uint8_t - I mean thankfully because otherwise it would be impossible to create memory pools. If it makes things clearer/safer you can even have an explicit char array buffer:

struct vdata {
   union union_data {
    uint8_t * A;   // 8 bytes
    uint8_t B[12]; // 12 bytes
    char buf[16];  // 16 bytes - could be std::byte buf[16]
  } u;
  
  int16_t& C() { return *(int16_t*)(&u.buf[12]); }
  int16_t& D() { return *(int16_t*)(&u.buf[14]); }
};

Regarding alignment The array is 8-aligned due to the address of the union, so positions 12&14 are guaranteed to be 2-aligned which is the requirement for int16_t (even though the string u.B appears in the code).

Alternatively you can force align the structure. The C++ alignas specifier would not be valid here because you want to lower the alignment of your structure, put a pragma directive is possible to give you again 16 bytes:

#pragma pack(4)
struct vdata {
  static_assert(sizeof(uint8_t *) == 8L, "size of pointer must be 8");
  union union_data {
    uint8_t * A; // 8 bytes
    uint8_t B[12]; // 12 bytes
  } u;
  int16_t C; // 2 bytes
  int16_t D; // 2 bytes
};

Demo

I'm fairly certain that this one will cause problems.

edited Aug 12 '21 at 08:37

answered Aug 11 '21 at 19:39

Nikos Athanasiou

29,616
15
87
153

2

This is UB, `*reinterpret_cast` results in a strict aliasing violation. – rustyx Aug 11 '21 at 19:51
@rustyx Is there a case where positions 12 & 14 won't be at least 2-aligned ? I see this as similar to having a memory pool of `char*` and building your `T` objects by imposing restrictions to the alignment of the char array. – Nikos Athanasiou Aug 11 '21 at 19:54
@NikosAthanasiou OP ask for `portable and standard compliant way` so I think that `std::launder` (https://en.cppreference.com/w/cpp/utility/launder) would be required in your first code sample. While addresses will probably always be a multiple of 2 in that case, I recommand using `static_assert` for any such assumption we make that do matter including final size. – Phil1970 Aug 11 '21 at 21:06
1

@NikosAthanasiou alignment and strict aliasing are separate issues, it is also UB to read a char array as a larger integer – M.M Aug 11 '21 at 22:59
If `pragma pack` is used, then alignment should be restored immediatly afterward to avoid the risk of affecting other classes. – Phil1970 Aug 12 '21 at 00:46
1

@M.M If the buffer is `char*` (i.e. single byte type like `uint8_t`) there's no strict aliasing violation - that's the exception in the rule . I've enabled strict aliasing warnings in the Demo and it gives no positives. If you try to double the type sizes (use `uint16_t` buffer and make C&D `uint32_t`) you will get the aliasing violation warning. The case is too simple to have a false negative warning. – Nikos Athanasiou Aug 12 '21 at 08:24
1

@rustyx I addressed the strict aliasing hypothesis as best as I can in the answer. Please let me know if it looks better now or there's a counter example – Nikos Athanasiou Aug 12 '21 at 08:29
@NikosAthanasiou There's an exception for reading int as char -- there is not an exception for reading char as int – M.M Aug 12 '21 at 21:38

Phil1970 · Answer 3 · 2021-08-12T18:32:45.097

As far as I understand, the following code would be the most safe one.

The data that specify the type is in the Initial common sequence. Thus you can access it either way (by using cda.C or cdb.C) so it is perfect for determining the type.

Then putting everything in a struct for both cases allows to ensure that each struct layout is independant (thus B can start before next 8 bytes alignment).

#include <cstdint>
#include <iostream>

struct CDA
{
    int16_t C;      // 2 bytes
    int16_t D;      // 2 bytes
    uint8_t* A;     // 8 bytes
};

struct CDB
{
    int16_t C;      // 2 bytes
    int16_t D;      // 2 bytes
    uint8_t B[12];  // 12 bytes
};

struct vdata {
    union union_data {
        CDA cda;
        CDB cdb;
    } u;

};

static_assert(sizeof(uint8_t*) == 8);
static_assert(sizeof(CDA) == 16);
static_assert(sizeof(CDB) == 16);
static_assert(offsetof(vdata::union_data, cda) == offsetof(vdata::union_data, cdb));
static_assert(offsetof(CDA, C) == offsetof(CDB, C));
static_assert(offsetof(CDA, C) == 0);
static_assert(sizeof(vdata) == 16);

int main()
{
    std::cout << "sizeof(CDA) : " << sizeof(CDA) << std::endl;
    std::cout << "sizeof(CDB) : " << sizeof(CDB) << std::endl;
    std::cout << "sizeof(vdata) : " << sizeof(vdata) << std::endl;
}

Usefull source of information:

How to decide?

If the size optimization is not that important, I would recommend to use std::variant.
If the size is important but the order is not, then the current solution might be the best choice.
If portability is not so important, then pragma pack solution might be appropriate (remember to reset alignment after the struct definition).
Otherwise, if you really need layout control, then either use:
- std::byte array and memcpy (access data with functions)
- placement new and std::launder.

In all cases, be sure to have appropriate assertion that verify assumptions you make. I have put many in my sample code but you can adjust depending on your need.

Also, unless you have millions of vdata items or you are on an embedded device, then using 24 bytes instead of 16 might not be a big deal.

You might also use conditionnal define to optimize only for your current compiler. This could be useful to ensure that you have working code (though maybe less optimal) for every target or it can allows to depend on behavior that is undefined from the standard but might be defined on your compiler.

Packing unions/structure to avoid padding

3 Answers3