Optimally packing a recursively templated struct without loss of alignment

Question

I have a struct of 4 fields of types that come from template parameters:

template <typename T1, typename T2, typename T3, typename T4>
struct __attribute__((aligned(8))) four_tuple {
  typedef struct {
    T1 t1;
    T2 t2;
    T3 t3;
    T4 t4;
  } payload;
  payload p;
};

Each type T1, T2, T3, and T4, is guaranteed to be a primitive type or a four_tuple<...>::payload type. The guarantees are recursive - you can think of the struct as encoding a quadtree whose leaf nodes are primitive types.

My goal is for the struct to have minimum possible sizeof, subject to the condition that all of the leaf nodes are properly aligned. The tools allowed for the optimization are class template specializations using:

reordering of fields t1, t2, t3, t4
addition of filler fields
gcc attribute packed on payload
maybe others?

I feel like there is a clever solution to this problem using enable_if and SFINAE. Can anyone find it?

To illustrate the problem, if we use the above implementation as-is using Foo = four_tuple<char,double,char,double>, we'll have a size of 32 for the payload and overall. If we simply declare the payload packed, the double's will not be well-aligned. A template specialization that reorders the fields in decreasing order (here, double, double, char, char) will give a payload and overall size of 24. But the extra 6 bytes it uses are wasteful, as can be seen by considering using Bar = four_tuple<Foo::payload,int,int,int>. With optimal packing Bar could fit in 32 bytes, but with this scheme it would require 40. Bluntly applying field-reordering with packed will result in misaligned int's in Bar - some filler is needed.

I know that in general restructuring the memory layout of a struct's fields can have performance implications due to cache considerations, and that in general those implications will be at least as significant as any potential gains from better packing. I'd like to explore the tradeoffs, though, and I can't really do that properly in my context without solving this problem.

I'm not seeing it. If you reorder the way you described, but don't add `packed`, what's the problem? Using it in another type would not give any misaligned fields, since the compiler would fix that with padding already. — , Jan 08 '16 at 07:38
BTW, why do you force alignment on your struct? If your types are four `char`s, you want a size of 4, don't you? That can't require an alignment of 8. — , Jan 08 '16 at 07:40
@hvd `four_tuple` can achieve a size of 32 with no misalignment if properly packed. If you don't add `packed`, it will have a size of 40. — dshin, Jan 08 '16 at 08:00
@hvd Solving my problem guarantees only that the leaves will be aligned relative to the `four_tuple`. If a particular instance of the `four_tuple` is misaligned, all bets are off. That's why I align 8. — dshin, Jan 08 '16 at 08:04
https://stackoverflow.com/questions/18975130/compile-time-re-arrangement-of-data-members has a detailed discussion of compile-time re-ordering to optimize packing. The answer also refers to https://rmf.io/cxx11/optimal-tuple-i/ which discusses tuple data layout. — Jens, Jan 08 '16 at 08:20
@dshin No, `four_tuple` can't achieve a size of 32. `Foo::payload` can't have a size smaller than 24 (using your platform's sizes) if you want to guarantee alignment, and you need three more `int`s. There's no way to re-use the padding in a member. — , Jan 08 '16 at 08:21
@dshin Consider by the way that if you make `Foo` packed, then an array `Foo[2]` will have at least one misaligned element, and you say you want to avoid misaligned elements, right? — , Jan 08 '16 at 08:24
@hvd I'm referring to the reordered `Foo` (double, double, char, char). Its (packed) payload can fit in 18 without misalignment. I say "without misalignment" assuming it is used only within the context of a template parameter of `four_tuple` - the `payload` struct is not meant to be publicly visible (I should make this clearer in my post). Also, `Foo` is not declared packed precisely so things like `Foo[2]` will work. — dshin, Jan 08 '16 at 08:43
@Jens Thanks for the links. I will take a look. I think the fact that my payload inner class is not well-aligned "publicly" will make it so I can't use the link's solution out-of-the-box, but hopefully it will not take great effort to adapt it. — dshin, Jan 08 '16 at 08:47

score 1 · Accepted Answer · answered Jan 08 '16 at 23:35

The big problem in your nested tuple case is that you want to have a field of type four_tuple<char,double,char,double>::payload, aligned as if it is a four_tuple<char,double,char,double>, but without requiring the container type to inherit its alignment. This is complicated. Doing this is possible, but it makes your code highly unportable to anything other than GCC. I guess that's okay since you suggest a GCC extension in your question already. The basic idea is that bit-fields can be used to insert padding to ensure an alignment:

struct __attribute__((packed)) S {
  char c; // at offset 0
  int i; // at offset 1, not aligned
  int : 0;
  int j; // at offset 8, aligned
  int : 0;
  int k; // at offset 12, no extra padding between j and k
};

int is of course a very specific type with a very specific alignment, and you need a dynamically determined alignment. Luckily, GCC allows bit-fields of type char, which generally only enforce byte alignment, to be combined with alignas, ensuring arbitrary alignment.

With that done, you can check all 24 possible field orderings and select the payload that gives the lowest total size. I made the payload a global type and gave it an additional template parameter to indicate the field order. This allows tuple4<T1, T2, T3, T4> to check tuple4_payload<T1, T2, T3, T4, 1234>, tuple4_payload<T1, T2, T3, T4, 1243>, etc. in order and pick whichever is best.

template <typename...> struct smallest;
template <typename...T> using smallest_t = typename smallest<T...>::type;

template <typename T> struct smallest<T> { using type = T; };
template <typename T, typename...Ts> struct smallest<T, Ts...> { using type = std::conditional_t<sizeof(T) <= sizeof(smallest_t<Ts...>), T, smallest_t<Ts...>>; };

template <typename T1, typename T2, typename T3, typename T4> struct tuple4;
template <typename T1, typename T2, typename T3, typename T4, int fieldOrder> struct tuple4_payload;
template <typename T1, typename T2, typename T3, typename T4> struct tuple4_simple { T1 t1; T2 t2; T3 t3; T4 t4; };

template <typename T> struct extract_payload { using type = T; };
template <typename...T> struct extract_payload<tuple4<T...>> { using type = typename tuple4<T...>::payload; };
template <typename T> using extract_payload_t = typename extract_payload<T>::type;

#define PERMS \
  PERM(1,2,3,4) PERM(1,2,4,3) PERM(1,3,2,4) PERM(1,3,4,2) PERM(1,4,2,3) PERM(1,4,3,2) \
  PERM(2,1,3,4) PERM(2,1,4,3) PERM(2,3,1,4) PERM(2,3,4,1) PERM(2,4,1,3) PERM(2,4,3,1) \
  PERM(3,1,2,4) PERM(3,1,4,2) PERM(3,2,1,4) PERM(3,2,4,1) PERM(3,4,1,2) PERM(3,4,2,1) \
  PERM(4,1,2,3) PERM(4,1,3,2) PERM(4,2,1,3) PERM(4,2,3,1) PERM(4,3,1,2) PERM(4,3,2,1)

#define PERM(a,b,c,d) \
  template <typename T1, typename T2, typename T3, typename T4> \
  struct __attribute__((packed)) tuple4_payload<T1, T2, T3, T4, a##b##c##d> { \
    char : 0 alignas(T##a); extract_payload_t<T##a> t##a; \
    char : 0 alignas(T##b); extract_payload_t<T##b> t##b; \
    char : 0 alignas(T##c); extract_payload_t<T##c> t##c; \
    char : 0 alignas(T##d); extract_payload_t<T##d> t##d; \
  };
PERMS
#undef PERM

#define PERM(a,b,c,d) , tuple4_payload<T1, T2, T3, T4, a##b##c##d>
template <typename, typename...T> using tuple4_smallest_payload_t = smallest_t<T...>;
template <typename T1, typename T2, typename T3, typename T4>
struct alignas(tuple4_simple<T1, T2, T3, T4>) tuple4 : tuple4_smallest_payload_t<void PERMS> {
  using payload = tuple4_smallest_payload_t<void PERMS>;
};
#undef PERM

In your case, you would use this as tuple4<int, tuple4<char, double, char, double>, int, int>. Note that even though the payload type is not mentioned explicitly here, it will still be used for the t2 member.

Thanks for the work. This doesn't compile for me but I think I get the idea. If I understand it correctly, it still won't be optimal in the sense that `tuple4, tuple4<2,4,8,8>, 8, 8>` will use filler space, while a naive packed implementation without reordering will not use filler and satisfy the requirement of no misalignment on the primitive types. (Here, `2=int16_t`, `4=int32_t`, `8=int64_t`). But it should work better than anything else I have. — dshin, Jan 11 '16 at 16:02
@dshin It compiles with GCC 4.9 and higher, 4.8 if you change `conditional_t<...>` to `conditional<...>::type`. Yes, you're right, it won't be optimal in your example. Since `packed` can only drop trailing padding if you want to maintain alignment to the start of the object, `tuple4<2,4,8,8>` gets re-ordered to `tuple4<8,8,4,2>`, at which point your preferred layout stops being possible. I don't see a way around that, sorry. — , Jan 11 '16 at 19:12
I can imagine in theory if the `tuple4::payload` type has 8 static bools corresponding to whether it can be offset by {0, 1, 2, 3, 4, 5, 6, 7} bytes, then a meta-algorithm can incorporate that in an exhaustive search. But some cleverness would be needed to trim the search space. — dshin, Jan 11 '16 at 19:23
@dshin Basically, you'd end up with an alignment requirement of 8*N+2, which is tricky and won't play well with any compiler built-in alignment functionality. But what else may be problematic is that the ideal layout of a tuple can't be determined without considering its containing tuple (if any): given `tuple4,tuple4<2,8,8,8>,8,8>`: you'd ideally want that to be ordered as `8,8,8,2,2,8,8,8,8,8`, but that means the first `tuple4<2,8,8,8>` and the second `tuple4<2,8,8,8>` would not have the same layout. — , Jan 11 '16 at 19:35
You're right, payload layout must be fixed regardless of context. So this seems like the best solution. — dshin, Jan 11 '16 at 23:38

Optimally packing a recursively templated struct without loss of alignment

1 Answers1