1

Lets consider the following task: My C++ module as part of an embedded system receives 8 bytes of data, like: uint8_t data[8]. The value of the first byte determines the layout of the rest (20-30 different). In order to get the data effectively, I would create different structs for each layout and put each to a union and read the data directly from the address of my input through a pointer like this:

struct Interpretation_1 {
    uint8_t multiplexer;
    uint8_t timestamp;
    uint32_t position;
    uint16_t speed;
};
// and a lot of other struct like this (with bitfields, etc..., layout is not defined by me :( )

union DataInterpreter {
    Interpretation_1 movement;
    //Interpretation_2 temperatures;
    //etc...
};

...
uint8_t exampleData[8] {1u, 10u, 20u,0u,0u,0u, 5u,0u};
DataInterpreter* interpreter = reinterpret_cast<DataInterpreter*>(&exampleData);
std::cout << "position: " << +interpreter->movement.position << "\n";

The problem I have is, the compiler can insert padding bytes to the interpretation structs and this kills my idea. I know I can use

  • with gcc: struct MyStruct{} __attribute__((__packed__));
  • with MSVC: I can use #pragma pack(push, 1) MyStruct{}; #pragma pack(pop)
  • with clang: ? (I could check it)

But is there any portable way to achieve this? I know c++11 has e.g. alignas for alignment control, but can I use it for this? I have to use c++11 but I would be just interested if there is a better solution with later version of c++.

curiousguy
  • 8,038
  • 2
  • 40
  • 58
user12345
  • 19
  • 1
  • 2
  • 2
    What I personally do is placing a `static_assert(sizeof(...) == ...)` directly after the struct. – ezegoing Jan 19 '20 at 02:07
  • 1
    @ezegoing, thanks for the reply. This is a good idea for a final check, but does not really help. I mentioned this example because here the struct size is already 12 bytes instead of 8. – user12345 Jan 19 '20 at 02:16
  • sort the members by their size? (biggest first) this, at least, avoids _unnecessary_ padding – Darklighter Jan 19 '20 at 02:21
  • 1
    Reading from unaligned memory is undefined behaviour in C++. (Although it will probably work on most architectures.) – Bernard Jan 19 '20 at 02:23
  • I think the method you mentioned is enough: either `__attribute__((__packed__))` or `#pragma pack`. Or you want shot once, `#pragma pack` is enough, these are simple enough. – superK Jan 19 '20 at 02:45
  • *But is there any portable way to achieve this?* -- No. Look at the many existing codebases that need to use `#ifdef` or similar constructs for "portability". If it were possible, you would see examples of it, but none probably exist. – PaulMcKenzie Jan 19 '20 at 02:54
  • @Bernard no, it won't work on most RISC architectures, although some systems may handle the unaligned exception in software and return the correctly loaded value – phuclv Jan 19 '20 at 03:16

2 Answers2

3

But is there any portable way to achieve this?

No, there is no (standard) way to "make" a type that would have padding to not have padding in C++. All objects are aligned at least as much as their type requires and if that alignment doesn't match with the previous sub objects, then there will be padding and that is unavoidable.

Furthermore, there is another problem: You're accessing through a reinterpreted pointed that doesn't point to an object of compatible type. The behaviour of the program is undefined.

We can conclude that classes are not generally useful for representing arbitrary binary data. The packed structures are non-standard, and they also aren't compatible across different systems with different representations for integers (byte endianness).


There is a way to check whether a type contains padding: Compare the size of the sub objects to the size of the complete object, and do this recursively to each member. If the sizes don't match, then there is padding. This is quite tricky however because C++ has minimal reflection capabilities, so you need to resort either hard coding or meta programming.

Given such check, you can make the compilation fail on systems where the assumption doesn't hold.

Another handy tool is std::has_unique_object_representations (since C++17) which will always be false for all types that have padding. But note that it will also be false for types that contain floats for example. Only types that return true can be meaningfully compared for equality with std::memcmp.

eerorika
  • 232,697
  • 12
  • 197
  • 326
1

Reading from unaligned memory is undefined behavior in C++. In other words, the compiler is allowed to assume that every uint32_t is located at a alignof(uint32_t)-byte boundary and every uint16_t is located at a alignof(uint16_t)-byte boundary. This means that if you somehow manage to pack your bytes portably, doing interpreter->movement.position will still trigger undefined behaviour.

(In practice, on most architectures, unaligned memory access will still work, but albeit incur a performance penalty.)

You could, however, write a wrapper, like how std::vector<bool>::operator[] works:

#include <cstdint>
#include <cstring>
#include <iostream>
#include <type_traits>

template <typename T>
struct unaligned_wrapper {
    static_assert(std::is_trivial<T>::value);
    std::aligned_storage_t<sizeof(T), 1> buf;
    operator T() const noexcept {
        T ret;
        memcpy(&ret, &buf, sizeof(T));
        return ret;
    }
    unaligned_wrapper& operator=(T t) noexcept {
        memcpy(&buf, &t, sizeof(T));
        return *this;
    }
};

struct Interpretation_1 {
    unaligned_wrapper<uint8_t> multiplexer;
    unaligned_wrapper<uint8_t> timestamp;
    unaligned_wrapper<uint32_t> position;
    unaligned_wrapper<uint16_t> speed;
};
// and a lot of other struct like this (with bitfields, etc..., layout is not defined by me :( )

union DataInterpreter {
    Interpretation_1 movement;
    //Interpretation_2 temperatures;
    //etc...
};

int main(){
    uint8_t exampleData[8] {1u, 10u, 20u,0u,0u,0u, 5u,0u};
    DataInterpreter* interpreter = reinterpret_cast<DataInterpreter*>(&exampleData);
    std::cout << "position: " << interpreter->movement.position << "\n";
}

This would ensure that every read or write to the unaligned integer is transformed to a bytewise memcpy, which does not have any alignment requirement. There might be a performance penalty for this on architectures with the ability to access unaligned memory quickly, but it would work on any conforming compiler.

Bernard
  • 5,209
  • 1
  • 34
  • 64
  • 1
    "*the compiler is allowed to assume that every uint32_t is located at a 4-byte boundary and every uint16_t is located at a 2-byte boundary.*" Incorrect. The compiler is allowed to assume that every `uint32_t` is located on an `alignof(uint32_t)` boundary. That *may* be 4 bytes, but it may also be 1. It's implementation dependent, and there is *no requirement* in the C++ standard that the alignment have any relationship to the size, save that the alignment has to be less than or equal to the size. – Nicol Bolas Jan 19 '20 at 03:00
  • Do you have anything to cite about the UB for unaligned access? Afaik it's not UB, but can cause errors on certain platforms. – NathanOliver Jan 19 '20 at 03:00
  • @NathanOliver: If you create an object in a piece of storage, the standard requires that the address of that storage satisfies the alignment requirements of the object. UB occurs if it does not. It's not a matter of "addressing"; it's a matter of creating the object in the first place. – Nicol Bolas Jan 19 '20 at 03:03
  • 1
    @TedLyngmo It should be `std::aligned_storage_t` instead of `std::aligned_storage`, I made a mistake there. – Bernard Jan 19 '20 at 08:35
  • @NicolBolas Yeah, I overlooked that. – Bernard Jan 19 '20 at 08:37