5

Looking for a tool that takes a C structure as input and outputs a structure with minimal size.

For example, given an initial structure with only 3 members

struct Book {
   char  title[50];
   char  author[25];
   int   book_id;
};

there are 6 permutations

struct Book1 {
   char  title[50];
   char  author[25];
   int   book_id;
};

struct Book2 {
   char  title[50];
   int   book_id;
   char  author[25];   
};

struct Book3 {
   char  author[25];     
   char  title[50];
   int   book_id; 
};

struct Book4 {
   char  author[25];     
   int   book_id; 
   char  title[50];   
};

struct Book5 {
   int   book_id; 
   char  author[25];     
   char  title[50];   
};

struct Book6 {
   int   book_id; 
   char  title[50];     
   char  author[25];     
};

The output shows that 80 bytes is the minimal size.

Book1 = 80
Book2 = 84
Book3 = 80
Book4 = 84
Book5 = 80
Book6 = 80

Several projects I work on contain structures with 10+ members (3628800 permutations) and are continually appended with new members by coders not familiar with structure packing complexities.

Question

Is it possible to have a tool to refactor the structure into the best minimal size?

vengy
  • 1,548
  • 10
  • 18
  • Related question: https://stackoverflow.com/questions/867471/automated-field-re-ordering-in-c-structs-to-avoid-padding – Aziz Jun 03 '22 at 16:28
  • 3
    While there are are `3,628,800` permutations for the members, many of them are equivalent as far as memory layout goes, because many of your members have the same memory layout (size, alignment, stride). So the actual number of permutations is not `numberOfMembers!`, it's `numberOfMembersWithUniqueLayout!`. Further, a brute-force approach isn't necessary. You can use a lazy algorithm that orders members from largest to smallest, and reliably get optimal results – Alexander Jun 03 '22 at 16:32
  • It looks like a [bin packing](https://en.wikipedia.org/wiki/Bin_packing_problem) problem or the basic optimal allocation policy problem (known to be hard). Packing is nearly trivial with power of two values like native types. For char arrays, it is a bit more complex but I am not sure this cause a lot of trouble since array do not need to be aligned. You should expect performance issue though if you align the first byte at the end of a cache line so to earn some space. If you add constraints about performance for array, then the problem is certainly NP-Hard. – Jérôme Richard Jun 03 '22 at 16:41
  • @JérômeRichard, i think that this problem can be solved by with dynamic programming by computing a function mapping a subset of members (encoded as bits of integer) to a minimal offset where this subset fits – tstanisl Jun 03 '22 at 16:56
  • 2
    See also [The Lost Art of Structure Packing](http://www.catb.org/esr/structure-packing/) – Ouroborus Jun 03 '22 at 17:13
  • 1
    Actually, the problem becomes really interesting if bitfields are considered. – tstanisl Jun 03 '22 at 17:29
  • This question seems very similar to https://stackoverflow.com/questions/4306186/structure-padding-and-packing – Dmytro Jun 03 '22 at 17:51

2 Answers2

3

Assuming that the size of any member is multiplicity of its alignment requirements which are powers of 2 then the optimal layout can be found by placing the members with the strictest alignment first. There will be no internal padding between members. The total size of the struct would be a sum of its members rounded to alignment of the first member with the strictest alignment, which is a lower bound anyway.

tstanisl
  • 13,520
  • 2
  • 25
  • 40
0

As long as your structure contain native types and there are no composite structures, then there is a very good heuristic (certainly optimal) to solve this problem: sort the fields by decreasing order based on their alignment constraint. The alignment constraint of a native type should be its size while for arrays it is the size of the item type (eg. 1 for a char array). I think this heuristic is also certainly optimal for composite structure having a power of two size or alternatively if all your sub-structures have a size that is a multiple of 8 on 64-bit platforms (except the last one that does not matter). For example, assuming int are aligned on 4 bytes, then it will be put first, then the two arrays whatvever the number of items (both results in the same overall size).

For composite structures, I think the problem is a bit harder to solve. It is similar to what allocator algorithm uses to pack data in the heap (so to minimize the space overhead). Allocators have the same constraints: the allocated type must follow alignment constraints while minimizing the overall space though they often also have an additional constraint: be fast. A good example is the aligned_alloc function. Many algorithm use a bucket strategy to solve this problem efficiently though the solution may not be optimal.

Note that compilers like GCC have extensions to pack data structures but they are not compliant to the C standard.

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59