Alignment and padding of data inside a blob

Question

I'm using a large blob (allocated memory) to store data continuously in the memory.

I want data inside the blob to be organized like this:

| data1 type | data1 | data2 type | data2 | dataN type | dataN |

dataN type is an int that I use in a switch to convert the dataN to the appropriate type.

The problem is I want to keep data properly aligned to do so I want to enforce all data inside the blob to be 8-bytes packed (I chosen 8 bytes for packing because it will probably keep data properly aligned?), this way data will tightly packed (there won't be holes between data->data types because of alignment).

I tried this:

#pragma pack(8)
class A
{
public:
    short b;
    int x;
    char v;
};

But it doesn't work because using sizeof(A) I get 12 bytes instead of the expected 16 bytes.

P.S: Is there any data type larger than 8 bytes in either x86 or x64 architectures?

The C/C++ standards don't specify the actual sizes of the data types (except for sizeof(char) == 1 and that char <= short <= int <= long <= long long). In (L)LP64 you can't get a primitive datatype bigger 64bit (long long)... — MFH, Jul 27 '12 at 21:42
There is compiler specific extensions for vectorized data that may be larger than 8 bytes. Also, if you have no control on the data, how can you ensure — kriss, Jul 27 '12 at 21:46
*long double* on my compiler is 16 bytes long. And obviously structs data types can be larger than 8 bytes. — kriss, Jul 27 '12 at 21:53

score 1 · Answer 1 · answered Jul 27 '12 at 21:49

It looks like in this case #pragma pack(8) has no effect.

In MS compiler documentation the parameter of pack is described in the following way: Specifies the value, in bytes, to be used for packing. The default value for n is 8. Valid values are 1, 2, 4, 8, and 16. The alignment of a member will be on a boundary that is either a multiple of n or a multiple of the size of the member, whichever is smaller.

Thus, the #pragma pack directive cannot increase the alignment of a member, but rather can decrease it (using #pragma pack(1) for example). In you case the whole structure alignment is chosen to make its biggest element to be naturally aligned (int which is usually 4 bytes on both 32 and 64-bit CPUs). As a result, the total size is 4 * 3 = 12 bytes.

score 1 · Accepted Answer · edited May 23 '17 at 11:55

This answer assumes two things:

You want the binary blob to be packed tightly (no holes).
You don't want the data members to accessed in an unaligned fashion (which is slow compared to accessing data members that are aligned the way the compiler wants by default).

If this is the case, then you should consider a design where you treat the large "blob" as a byte-oriented stream. In this stream, you marshall/demarshall tag/value pairs that populate objects having natural alignment.

With this scheme, you get the best of both worlds. You get a tightly packed blob, but once you extract objects from the blob, accessing object members is fast because of the natural alignment. It is also portable¹ and does not rely of compiler extensions. The disadvantage is the boilerplate code that you need to write for every type that can be put in the blob.

Rudimentary example:

#include <cassert>
#include <iomanip>
#include <iostream>
#include <stdint.h>
#include <vector>

enum BlobKey
{
    kBlobKey_Widget,
    kBlobKey_Gadget
};

class Blob
{
public:
    Blob() : cursor_(0) {}

    // Extract a value from the blob. The key associated with this value should
    // already have been extracted.
    template <typename T>
    Blob& operator>>(T& value)
    {
        assert(cursor_ < bytes_.size());
        char* dest = reinterpret_cast<char*>(&value);
        for (size_t i=0; i<sizeof(T); ++i)
            dest[i] = bytes_[cursor_++];
        return *this;
    }

    // Insert a value into the blob
    template <typename T>
    Blob& operator<<(const T& value)
    {
        const char* src = reinterpret_cast<const char*>(&value);
        for (size_t i=0; i<sizeof(T); ++i)
            bytes_.push_back(src[i]);
        return *this;
    }

    // Overloads of << and >> for std::string might be useful

    bool atEnd() const {return cursor_ >= bytes_.size();}

    void rewind() {cursor_ = 0;}

    void clear() {bytes_.clear(); rewind();}

    void print() const
    {
        using namespace std;
        for (size_t i=0; i<bytes_.size(); ++i)
            cout << setfill('0') << setw(2) << hex << int(bytes_[i]) << " ";
        std::cout << "\n" << dec << bytes_.size() << " bytes\n";
    }

private:
    std::vector<uint8_t> bytes_;
    size_t cursor_;
};

class Widget
{
public:
    explicit Widget(int a=0, short b=0, char c=0) : a_(a), b_(b), c_(c) {}
    void print() const
    {
        std::cout << "Widget: a_=" << a_ << " b=" << b_
                  << " c_=" << c_ << "\n";
    }
private:
    int a_;
    short b_;
    long c_;
    friend Blob& operator>>(Blob& blob, Widget& widget)
    {
        // Demarshall members from blob
        blob >> widget.a_;
        blob >> widget.b_;
        blob >> widget.c_;
        return blob;
    };
    friend Blob& operator<<(Blob& blob, Widget& widget)
    {
        // Marshall members to blob
        blob << kBlobKey_Widget;
        blob << widget.a_;
        blob << widget.b_;
        blob << widget.c_;
        return blob;
    };
};

class Gadget
{
public:
    explicit Gadget(long a=0, char b=0, short c=0) : a_(a), b_(b), c_(c) {}
    void print() const
    {
        std::cout << "Gadget: a_=" << a_ << " b=" << b_
                  << " c_=" << c_ << "\n";
    }
private:
    long a_;
    int b_;
    short c_;
    friend Blob& operator>>(Blob& blob, Gadget& gadget)
    {
        // Demarshall members from blob
        blob >> gadget.a_;
        blob >> gadget.b_;
        blob >> gadget.c_;
        return blob;
    };
    friend Blob& operator<<(Blob& blob, Gadget& gadget)
    {
        // Marshall members to blob
        blob << kBlobKey_Gadget;
        blob << gadget.a_;
        blob << gadget.b_;
        blob << gadget.c_;
        return blob;
    };
};

int main()
{
    Widget w1(1,2,3), w2(4,5,6);
    Gadget g1(7,8,9), g2(10,11,12);

    // Fill blob with widgets and gadgets
    Blob blob;
    blob << w1 << g1 << w2 << g2;
    blob.print();

    // Retrieve widgets and gadgets from blob
    BlobKey key;
    while (!blob.atEnd())
    {
        blob >> key;
        switch (key)
        {
            case kBlobKey_Widget:
                {
                    Widget w;
                    blob >> w;
                    w.print();
                }
                break;

            case kBlobKey_Gadget:
                {
                    Gadget g;
                    blob >> g;
                    g.print();
                }
                break;

            default:
                std::cout << "Unknown object type in blob\n";
                assert(false);
        }
    }
}

If you can use Boost, you might want to use Boost.Serialization with a binary memory stream, as in this answer.

_{(1) Portable means that the source code should compile anywhere. The resulting binary blob will not be portable if transferred to other machines with different endianness and integer sizes.}

Your code looks useful but not to my problem. I need extremely fast access to the objects inside the blob so I can't be copying objects around. In your code, objects inside the blob are not properly aligned to be read directly from the blob. By the way is there any reason for not using memcpy() instead of the loops? — Tiago Costa, Jul 27 '12 at 23:38
@TiagoCosta : `memcpy` might be slower when `sizeof(T)` is small. With the explicit loop, the compiler can unroll and inline the code. This is the kind of thing that would need to be benchmarked (or the assembly analyzed) to know for sure. — Emile Cormier, Jul 28 '12 at 00:03
@TiagoCosta, you must realize that it's slow to access members that are not aligned the way the compiler wants. By demarshalling from packed to "loose", you pay a performance hit once, but will gain it back (and more) if you access the members frequently enough. Your question seems contradictory -- you want the blob to be packed tightly (with no holes), yet you want your struct members to be aligned naturally for fast access. You can't have it both ways, unless you do some kind of marshalling/demarshalling like I've shown above. — Emile Cormier, Jul 28 '12 at 00:06
I did the converse of this in a project a year ago. I wrote a template class for getting setting locations in the blog. When I created the instance I queried the blob for an index, and subsequently I memcpy()'ed at INDEX*8. I will only note additionally that I was not space constrained. — David, Jul 28 '12 at 00:54

score 0 · Answer 3 · answered Jul 27 '12 at 22:08

@Negai explained why you get the observed size.

You should also reconsider your assumptions about "tightly packed" data. With the above structure there is holes in the structure. Assuming 32 bits int and 16 bits short, there is a two bytes hole after the short, and a 3 bytes hole after the char. But it does not matter as this space is inside the structure.

In other words either you get a tightly packed data structure, or you get an aligned data structure, but not both.

Typically, you won't have anything special to do to get the "aligned" behavior that is what the compiler do by default. #pragma pack is useful if you want your data "packed" instead of aligned, that is removing some holes introduced by compiler to keep data aligned.

That's right I don't care holes inside the structure because those affect the result of sizeof(). So `"pragma pack()` should only be used to change the internal alignment of a structure. — Tiago Costa, Jul 27 '12 at 22:17

score 0 · Answer 4 · answered Jul 28 '12 at 01:09

Did you try this?

class A {
public:
    union {
        uint64_t dummy;

        int data;
    };
};

Instances of A and its data member will always be aligned to 8 bytes now. Of course this is pointless if you squeeze a 4 byte data type in the front, it has to be 8 bytes too.

Alignment and padding of data inside a blob

4 Answers4