69

If I have a struct in C++, is there no way to safely read/write it to a file that is cross-platform/compiler compatible?

Because if I understand correctly, every compiler 'pads' differently based on the target platform.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Baruch
  • 20,590
  • 28
  • 126
  • 201
  • 4
    The efficiency (performance) gained by performing binary I/O often does not justify the money spent in research, design, development and especially debugging and maintenance. Source code should be simple to understand, but no simpler. – Thomas Matthews Mar 22 '11 at 22:24

4 Answers4

59

No. That is not possible. It's because of lack of standardization of C++ at the binary level.

Don Box writes (quoting from his book Essential COM, chapter COM As A Better C++)

C++ and Portability


Once the decision is made to distribute a C++ class as a DLL, one is faced with one of the fundamental weaknesses of C++, that is, lack of standardization at the binary level. Although the ISO/ANSI C++ Draft Working Paper attempts to codify which programs will compile and what the semantic effects of running them will be, it makes no attempt to standardize the binary runtime model of C++. The first time this problem will become evident is when a client tries to link against the FastString DLL's import library from a C++ developement environment other than the one used to build the FastString DLL.

Struct padding is done differently by different compilers. Even if you use the same compiler, the packing alignment for structs can be different based on what pragma pack you're using.

Not only that if you write two structs whose members are exactly same, the only difference is that the order in which they're declared is different, then the size of each struct can be (and often is) different.

For example, see this,

struct A
{
   char c;
   char d;
   int i;
};

struct B
{
   char c;
   int i;
   char d;
};

int main() {
        cout << sizeof(A) << endl;
        cout << sizeof(B) << endl;
}

Compile it with gcc-4.3.4, and you get this output:

8
12

That is, sizes are different even though both structs have the same members!

The bottom line is that the standard doesn't talk about how padding should be done, and so the compilers are free to make any decision and you cannot assume all compilers make the same decision.

John Doe
  • 1,613
  • 1
  • 17
  • 35
Nawaz
  • 353,942
  • 115
  • 666
  • 851
  • 5
    There is `__attribute__((packed))` which I use for shared-memory structures as well as ones used to map network data. It does affect performance (see http://digitalvampire.org/blog/index.php/2006/07/31/why-you-shouldnt-use-__attribute__packed/ ) but it's a useful feature for network-related structs. (It's not a standard as far as I know, so the answer is still true). – Pijusn Jun 08 '15 at 07:11
  • I don't understand why struct A size is 8 and not more. { char c; // what about this? char d; // size 1 + padding of 3 int i; // size 4 }; – Dchris Mar 03 '17 at 08:01
  • 6
    @Dchris - the compiler is probably being careful to ensure that each field is aligned based on its own natural alignment. c and d are one byte and thus aligned no matter where you put them for the single-byte CPU instructions. The int however needs to be aligned on a 4-byte boundary, which to get there requires two bytes of padding after d. This gets you to 8. – hoodaticus May 25 '17 at 20:58
  • It seems like most compilers would align members in the same way. Are there really compilers out there that would put padding between `A::c` and `A::d`? If there aren't, then am I correct in saying that the problem is only that the standard doesn't make an guarantees even though every compiler seems to be doing the same thing (much like a `reinterpret_cast`). – Indiana Kernick Jan 10 '20 at 00:54
30

If you have the opportunity to design the struct yourself, it should be possible. The basic idea is that you should design it so that there would be no need to insert pad bytes into it. the second trick is that you must handle differences in endianess.

I'll describe how to construct the struct using scalars, but the you should be able to use nested structs, as long as you would apply the same design for each included struct.

First, a basic fact in C and C++ is that the alignment of a type can not exceed the size of the type. If it would, then it would not be possible to allocate memory using malloc(N*sizeof(the_type)).

Layout the struct, starting with the largest types.

 struct
 {
   uint64_t alpha;
   uint32_t beta;
   uint32_t gamma;
   uint8_t  delta;

Next, pad out the struct manually, so that in the end you will match up the largest type:

   uint8_t  pad8[3];    // Match uint32_t
   uint32_t pad32;      // Even number of uint32_t
 }

Next step is to decide if the struct should be stored in little or big endian format. The best way is to "swap" all the element in situ before writing or after reading the struct, if the storage format does not match the endianess of the host system.

Lindydancer
  • 25,428
  • 4
  • 49
  • 68
  • 1
    This sounds interesting. But can you get more in Detail: Why do you order it by type length descending and why did you pad it that you have an even number of uint32_t? – Phil Feb 21 '15 at 22:52
  • 2
    @Phil, A basic type, like `uint32_t`, can (potentially) have an alignment requirement that match its size, in this case four bytes. A compiler may insert padding to achieve this. By doing this manually, there will be no need for the compiler to do this, as the alignment always will be correct. The drawback is that on systems with less strict alignment requirements, a manually padded struct will be larger than one padded by the compiler. You can do this in ascending or descending order, but you will need to insert more pads in the middle of the struct if you do int in ascending order... – Lindydancer Feb 22 '15 at 08:34
  • 1
    ... Padding in the end of the struct is only needed if you plan to use it in arrays. – Lindydancer Feb 22 '15 at 08:35
  • I'm not an expert - this seems like a 'heuristic' which might work, but definitely does not guarantee that the same padding would be used. Is that the case? Can you explain why your answer is the complete opposite of the other highly-voted answers here? – jwg Aug 13 '15 at 06:52
  • 2
    @jwg. In the general case (like, when you use a struct someone else has designed), padding can be inserted to ensure that no field end up on a location the hardware can't read (as explained in the other answers). However, when you design the struct yourself, you can, with some care, ensure that no padding is needed. These two facts do not, in any way, oppose each other! I believe that this heuristic will hold for all possible architectures (given that a type to doesn't have an alignment requirement which is greater than it's size, which isn't legal in C anyway). – Lindydancer Aug 13 '15 at 09:48
  • 2
    @Lindydancer - padding is needed if you intend to composite them into a contiguous memory block of random stuff, not necessarily just a homogenous array. Padding can make you self-aligning on arbitrary boundaries such as sizeof(void*) or the size of an SIMD register,. – hoodaticus May 25 '17 at 21:02
  • 1
    @TimSeguine, The statement "the alignment of a type can not exceed the size of the type" is true. Otherwise, `malloc(2*sizeof(a_type))` (or `new[])` would not return an array, where both elements could be accessed. On a given system, `std::max_align_t` is a typedef of the highest aligned scalar (like `long double`). If it were a typedef to a scalar with lower alignment, then the C++ implementation would be broken. However, please prove me wrong, all you need to do is to come up with a single example where `sizeof(type) < alignof(type)' would hold. – Lindydancer Feb 19 '18 at 15:45
  • @Lindydancer I was under the impression that the `alignas` specifier doesn't affect the `sizeof`. I was also under the impression that `sizeof(long double)` was 10 on intel architectures with gcc. Both are apparently not true. – Tim Seguine Feb 21 '18 at 16:32
  • The alignment bumps the size of types. For example, a `struct` with a `short`(on a machine where it is 2 bytes and is 2 byte aligned) and a `char` (1 byte), has the size 4. – Lindydancer Feb 23 '18 at 15:14
  • 1
    Yes, I see now that I was just wrong. By experimentation and just pure logic. I think my mistake was thinking sizeof was an intrinsic constant of the type. Since alignment can be changed arbitrarily, it seemed like the two could not be related. – Tim Seguine Mar 18 '18 at 21:29
10

No, there's no safe way. In addition to padding, you have to deal with different byte ordering, and different sizes of builtin types.

You need to define a file format, and convert your struct to and from that format. Serialization libraries (e.g. boost::serialization, or google's protocolbuffers) can help with this.

Erik
  • 88,732
  • 13
  • 198
  • 189
3

Long story short, no. There is no platform-independent, Standard-conformant way to deal with padding.

Padding is called "alignment" in the Standard, and it begins discussing it in 3.9/5:

Object types have alignment requirements (3.9.1, 3.9.2). The alignment of a complete object type is an implementation-defined integer value representing a number of bytes; an object is allocated at an address that meets the alignment requirements of its object type.

But it goes on from there and winds off to many dark corners of the Standard. Alignment is "implementation-defined" meaning it can be different across different compilers, or even across address models (ie 32-bit/64-bit) under the same compiler.

Unless you have truly harsh performance requirements, you might consider storing your data to disc in a different format, like char strings. Many high-performance protocols send everything using strings when the natural format might be something else. For example, a low-latency exchange feed I recently worked on sends dates as strings formatted like this: "20110321" and times are sent similarly: "141055.200". Even though this exchange feed sends 5 million messages per second all day long, they still use strings for everything because that way they can avoid endian-ness and other issues.

John Dibling
  • 99,718
  • 31
  • 186
  • 324