3

sizeof can be used to get the size of a struct or class. offsetof can be used to get the byte offset of a field within a struct or class.

Similarly, is there a way to get the size of the trailing padding of a struct or class? I'm looking for a way that doesn't depend on the layout of the struct, e.g. requiring the last field to have a certain name.

For the background, I'm writing out a struct to disk but I don't want to write out the trailing padding, so the number of bytes I need to write is the sizeof minus the size of the trailing padding.

CLARIFICATION: The motivation for not writing the trailing padding is to save output bytes. I'm not trying to save the internal padding as I'm not sure about the performance impact of non-aligned access and I want the writing code to be low-maintenance such that it doesn't need to change if the struct definition changes.

Joshua Chia
  • 1,760
  • 2
  • 16
  • 27
  • 3
    sizeof(struct) - offsetof last member - sizeof last member – Mohit Jain Feb 19 '16 at 08:04
  • 9
    The padding doesn't have to be trailing. `struct x {char a; long long b;};` might have 7 bytes padding in the middle. – Bo Persson Feb 19 '16 at 08:07
  • Possible duplicate of [struct padding influence in C struct serialization ( saving to file )](http://stackoverflow.com/questions/5557083/struct-padding-influence-in-c-struct-serialization-saving-to-file) – 2501 Feb 19 '16 at 08:11
  • 1
    There's no way of finding that out without referring to the last member by name. – molbdnilo Feb 19 '16 at 08:11
  • 4
    Don't write raw memory (like with `fwrite(thing, sizeof(thing), 1, file)`) unless you know exactly what you're doing. Write proper serialisation code instead. – Daniel Jour Feb 19 '16 at 08:13
  • If you just want to write the binary contents to disc, then `sizeof` should do the trick; it will give the actual size of the object. If you need to space the objects out, the `alignof` would also be needed; possible then `alignof() - sizeof()`? Frankly though, binary serialisation isn't always a good idea, it isn't portable, so it could be better to use a json/xml/yaml implementation. – Niall Feb 19 '16 at 09:14

4 Answers4

3

The way a compiler can pad fields in a structure is not strictly defined in the standard, so it's a kind of free choice and implementation dependent. If a data aggregate have to be interchanged the only solution is to avoid any padding.
This is normally accomplished using a #pragma pack(1). This pragma instructs the compiler to pack all fields together on a 1 byte boundary. It will slow the access on some processors, but will make the structure compact and well defined on any system, and, of course, without any padding.

Frankie_C
  • 4,764
  • 1
  • 13
  • 30
  • I'm unsure of the impact of non-aligned access on my platform (x86_64), but I want to save what output bytes I can by simply not writing the trailing padding. – Joshua Chia Feb 19 '16 at 09:17
1

pragma pack or equivalent is the canonical way to do that. Apart from that I can only think of a macro, if the number of members is fixed or the maximum number is low, like

$ cat struct-macro.c && echo
#include<stdio.h>

using namespace std;

#define ASSEMBLE_STRUCT3(Sname, at, a, bt, b, ct, c) struct Sname {at a; bt b; ct c; }; \
       int Sname##_trailingbytes() { return sizeof(struct Sname) - offsetof(Sname, c) - sizeof(ct); }

ASSEMBLE_STRUCT3(S, int, i1, int, i2, char, c)

int main()
{
        printf("%d\n", S_trailingbytes());
}
$ g++ -Wall -o struct-macro struct-macro.c && ./struct-macro
3
$

I wonder if something fancy can be done with a variadic template class with in C++. But I can't quite see how the class/structure can be defined and the offset function/constant be provided without a macro again -- which would defeat the purpose.

Peter - Reinstate Monica
  • 15,048
  • 4
  • 37
  • 62
1

This is a possiblity:

#define BYTES_AFTER(st, last) (sizeof (st) - offsetof(st, last) - sizeof ((st*)0)->last)

As is this (C99) approach:

#define BYTES_AFTER(st, last) (sizeof (st) - offsetof(st, last) - sizeof (st){0}.last)

Another way is just declaring your structs packed via some non-standard #pragma or similar. This would also take care of padding in the middle.

Neither of those two are pretty though. Sharing between different systems might not work because different alignment requirements. And using non-standard extensions is, well, non-standard.

Just do the serialization yourself. Maybe something like that:

unsigned char buf[64];
mempcpy(mempcpy(mempcpy(buf,
   &st.member_1, sizeof st.member_1),
   &st.member_2, sizeof st.member_2),
   &st.member_3, sizeof st.member_3);

mempcpy is a GNU extension, if it's not available, just define it yourself:

static inline void * mempcpy (void *dest, const void *src, size_t len) {
   return (char*)memcpy(dest, src, len) + len;
}

IMO, it makes code like that easier to read.

a3f
  • 8,517
  • 1
  • 41
  • 46
  • The "quite common" option does not involved undefined behaviour, since `sizeof` does not evaluate its operands. In both cases, you've left out a trailing `)`, so code which uses those macros would not compile. If someone is worrying about padding in a struct, odds are they are doing something which is non-portable between systems anyway. – Peter Feb 19 '16 at 09:08
  • @Lundin then you wrap the 100 lines in a function and call that instead, just like you did in your answer. – a3f Feb 19 '16 at 09:44
  • @Peter I am not commenting on the dereference but on the `->`. This seems to be less clear cut. `->` is used on pointers to objects and the null pointer constant isn't one. I believe that's the reason e.g. GCC defines `offsetof` using a builtin, but I am not sure. Also I counted the parentheses and can't find the one I miss, could you point it out for me? – a3f Feb 19 '16 at 09:50
  • @a3f I can't find that either. Might be that he doesn't know that `sizeof unary_expression` is valid C. Should work just fine for compound literals etc. – Lundin Feb 19 '16 at 10:26
  • Look like an error on my part with the brackets. Yes, I do know that `sizeof` does not require braces on operands. My comment about the "common approach" not involving undefined behaviour stands. `->` is a dereferencing operator and, in `sizeof`s unary_expression can be applied to a NULL, since the expression is not evaluated. – Peter Feb 20 '16 at 12:10
  • @a3f - that answer doesn't apply here, since it was responding to a question about what happens if the expression `&(((struct name *)NULL)->b)` is evaluated. `sizeof unary_expression` does not evaluate `unary_expression`. – Peter Feb 20 '16 at 12:27
  • @Peter Oh, sorry. I missed that. Hmm, I am still not fully sure that it's ok but I rephrased that part. Thanks for pointing it out. – a3f Feb 20 '16 at 13:27
  • I'm 99% sure that this code assumes that the last field in the struct doesn't have the trailing padding, which is generally false. – Dmitrii Demenev Mar 02 '22 at 21:50
  • @DmitriiDemenev why do you think so? `sizeof` of the struct will contain trailing padding and if we subtract everything coming before it, you get the trailing paddding. – a3f Mar 03 '22 at 07:49
  • @a3f because the last field might have trailing padding too. And the solution doesn't account for that – Dmitrii Demenev Mar 05 '22 at 02:37
1

The padding could be anywhere inside the struct, except at the very beginning. There is no standard way to disable padding, although some flavour of #pragma pack is a common non-standard extension.

What you actually should do if you want a robust, portable solution, is to write a serialization/de-serialization routine for your struct.

Something like this:

typedef
{
  int x;
  int y;
  ...
} mytype_t;

void mytype_serialize (uint8_t* restrict dest, const mytype_t* restrict src)
{
  memcpy(dest, &src->x, sizeof(src->x)); dest += sizeof(src->x);
  memcpy(dest, &src->y, sizeof(src->y)); dest += sizeof(src->y);
  ...
}

And similarly for the other way around.

Please note that padding is there for a reason. If you get rid of it, you sacrifice execution speed in favour of memory size.

EDIT

The weird way to do it, just by skipping trailing padding:

size_t mytype_serialize (uint8_t* restrict dest, const mytype_t* restrict src)
{
  size_t size = offsetof(my_type_t, y); // assuming y is last object
  memcpy(dest, src, size); 
  memcpy(dest+size, &src->y, sizeof(src->y));
  size += sizeof(src->y);
  return size;
}

You need to know the size and do something meaningful with it, because otherwise you can't know the size of the stored data when you need to read it back.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • I'm talking about trailing padding, the part immediately following the last field. Also, I'm looking for an simple, low-maintenance, approach that is independent of the definition of the struct. I also want to avoid `#pragma pack` because I'm unsure whether there will be performance impact on my platform (x86_64) from non-alignment. – Joshua Chia Feb 19 '16 at 09:12
  • @Syncopated You can use the very same approach as above. Just memcpy everything from the start until offsetof the last member, then memcpy the last member. It sounds odd to save some padding while skipping other padding though. How do you portably know the size of the stored data? – Lundin Feb 19 '16 at 09:16
  • Thanks, but I was looking for something that doesn't depend on the definition of the struct. – Joshua Chia Feb 19 '16 at 10:13
  • @Syncopated No matter what you do, you need to state the last element. If you believe it would be more generic, you could rewrite my function as an icky macro and pass the last object as a parameter. – Lundin Feb 19 '16 at 10:30