C++ how to handle alignment padding when casting raw data to class objects

Question

I read big sections of a file as "blobs" of data to char arrays. I know how these blobs are structured, and have created classes for the different structures. Then I want to cast the read char arrays to arrays of appropriate class objects.

This has worked well for certain cases, but I have gotten to a case where alignment / padding of the class members is an issue.

Here is a minimal example, but instead of getting data from a file, I define the data in data_i1, data_d1 and data_i2, then cast it to c_data. c_data represents the data read from the file and contains data_i1, data_d1 and data_i2 twice.

Without alignment being a problem, if I cast c_data to and array of Data, I should get the initial data in Data[0] and Data[1].

#include <iostream>

class Data {
public:
    int     i1[2];
    double  d1[3];
    int     i2[3];
};


int main()
{
    //Setting some data for the example:
    int     data_i1[2] = {  1,   100};          //2 * 4 =  8 bytes
    double  data_d1[3] = {0.1, 100.2, 200.3 };  //3 * 8 = 24 bytes
    int     data_i2[3] = {  2,   200, 305   };  //3 * 4 = 12 bytes
                                                //total = 44 bytes

    //As arrays the data is 44 bytes, but size of Data is 48 bytes:
    printf("sizeof(data_i1) = %d\n",    sizeof(data_i1));
    printf("sizeof(data_d1) = %d\n",    sizeof(data_d1));
    printf("sizeof(data_i2) = %d\n",    sizeof(data_i1));
    printf("total size      = %d\n\n",  sizeof(data_i1) + sizeof(data_d1) + sizeof(data_i2));
    printf("sizeof(Data)    = %d\n",    sizeof(Data));


    //This can hold the above that of 44 bytes, twice:
    char c_data[88];

    //Copying the data from the arrays to a char array
    //In reality the data is read from a binary file to the char array
    memcpy(c_data +  0, data_i1,  8);
    memcpy(c_data +  8, data_d1, 24);
    memcpy(c_data + 32, data_i2, 12); //c_data contains data_i1, data_d1, data_i2
    memcpy(c_data + 44,  c_data, 44); //c_data contains data_i1, data_d1, data_i2 repeated twice

    //Casting the char array to a Data array:
    Data* data = (Data*)c_data;

    //The first Data object in the Data array gets the correct values:
    Data data1 = data[0];
    //The second Data object gets bad data:
    Data data2 = data[1];

    printf("data1 : [%4d, %4d] [%4.1f, %4.1f, %4.1f] [%4d, %4d, %4d]\n", data1.i1[0], data1.i1[1], data1.d1[0], data1.d1[1], data1.d1[2], data1.i2[0], data1.i2[1], data1.i2[2]);
    printf("data2 : [%4d, %4d] [%4.1f, %4.1f, %4.1f] [%4d, %4d, %4d]\n", data2.i1[0], data2.i1[1], data2.d1[0], data2.d1[1], data2.d1[2], data2.i2[0], data2.i2[1], data2.i2[2]);

    return 0;
}

The code output is:

sizeof(data_i1) = 8
sizeof(data_d1) = 24
sizeof(data_i2) = 8
total size      = 44

sizeof(Data)    = 48
data1 : [   1,  100] [ 0.1, 100.2, 200.3] [   2,  200,  305]
data2 : [ 100, -1717986918] [-92559653364574087271962722384372548731666605007261414794985472.0, -0.0,  0.0] [-390597128,  100, -858993460]

How should I correctly handle this? Can I somehow disable this padding/alignment (if that is the right term)? Is it possible to create a member function to the class to specify how the casting is done?

@Jarod42 That duplicate post has some useful information, but it is not a duplicate. I'm not asking why the sizes are different, or what alignment is. I'm asking, how should I be handling this given that alignment is an issue in this case. — remi, Apr 11 '20 at 16:42
It seems you are looking for `__attribute__((packed))` or `pragma pack`. — Jarod42, Apr 11 '20 at 16:51
Just don't cast bytes to other types, not even POD structs. C++ makes no guarantees it will ever work. Define an actual `Data[2]` and name its members in the usual way like `Data[0].i1[0]`. — aschepler, Apr 11 '20 at 16:55
Your `c_data` array has 48 bytes in it, but you copy in 88 and try to read 96. This results in Undefined behavior. `data[1]` will read from this out-of-bounds data. — 1201ProgramAlarm, Apr 11 '20 at 17:39

walnut · Accepted Answer · 2020-04-11T19:26:21.797

Before C++20, you are not allowed to just cast a pointer to a different type and use it if you haven't actually created an object of the destination type.

Since C++20 this is allowed in your specific case because objects will be created implicitly in char arrays when they start their lifetime and the object has implicit-lifetime type, which your Data happens to have.

But even in C++20, you have no guarantee that there won't be any padding between members of the struct and therefore it is not safe to just cast the pointer or memcpy the whole struct. Even if you verify that there is no padding issue, you need to additionally provide correct alignment to the storage array with alignas:

alignas(alignof(Data)) char c_data[sizeof(Data)*2];

and probably you will also need to call std::launder on the pointer to make it point to the implicitly-created Data object:

Data* data = std::launder(reinterpret_cast<Data*>(c_data));

Instead of doing all of that, create an object of type Data (or array thereof) directly (this also resolves the alignment issue) and memcpy the individual members one-by-one to avoid padding issues:

Data data[2];

// Loop through array and `memcpy` each member individually

Also, do not use explicit number constants for sizes and offsets. Always use sizeof on the correct types to make sure that you don't accidentally cause a mismatch, which you already have in your code, causing access to the storage array out-of-bounds.

As a non-portable alternative, compilers usually offer attributes to force class members to be packed without leaving any padding room, see this question. However, this may come with significant performance loss because CPUs usually assume certain alignment of certain types and if data isn't aligned like that the operations will either take longer or may not be allowed at all depending on the architecture.

Also, even if you pack your Data struct, the points I made above about the casting still apply, however it might allow you to just declare

Data data[2];

from the start and directly read from the file into this data. (The cast reinterpret_cast<char*>(data) and writing through that pointer is allowed if Data is trivially-copyable, which it is here, and assuming that the data you read actually has the proper layout for Data.)

C++ how to handle alignment padding when casting raw data to class objects

1 Answers1