27

Consider the following simple struct:

struct A
{
    float data[16];
};

My question is:

Assuming a platform where float is a 32-bit IEEE754 floating point number (if that matters at all), does the C++ standard guarantee the expected memory layout for struct A? If not, what does it guarantee and/or what are the ways to enforce the guarantees?

By the expected memory layout I mean that the struct takes up 16*4=64 bytes in memory, each consecutive 4 bytes occupied by a single float from the data array. In other words, expected memory layout means the following test passes:

static_assert(sizeof(A) == 16 * sizeof(float));
static_assert(offsetof(A, data[0]) == 0 * sizeof(float));
static_assert(offsetof(A, data[1]) == 1 * sizeof(float));
...
static_assert(offsetof(A, data[15]) == 15 * sizeof(float));

(offsetof is legal here since A is standard layout, see below)

In case this bothers you, the test actually passes on wandbox with gcc 9 HEAD. I have never met a combination of a platform and compiler which would provide evidence that this test may fail, and I would love to learn about them in case they do exist.

Why would one even care:

  • SSE-like optimizations require certain memory layout (and alignment, which I ignore in this question, since it can be dealt with using the standard alignas specifier).
  • Serialization of such a struct would simply boil down to a nice and portable write_bytes(&x, sizeof(A)).
  • Some APIs (e.g. OpenGL, specifically, say, glUniformMatrix4fv) expect this exact memory layout. Of course, one could just pass the pointer to data array to pass a single object of this type, but for a sequence of these (say, for uploading matrix-type vertex attributes) a specific memory layout is still needed.

What is actually guaranteed:

These are the things that, to my knowledge, can be expected from struct A:

  • It is standard layout
  • As a consequence of being standard-layout, a pointer to A can be reinterpret_cast to a pointer to its first data member (which is, presumably, data[0] ?), i.e. there is no padding before the first member.

The two remaining guarantees that are not (as to my knowledge) provided by the standard are:

  • There is no padding in between elements of an array of primitive type (I am sure that this is false, but I failed to find a confirmative reference),
  • There is no padding after the data array inside struct A.
lisyarus
  • 15,025
  • 3
  • 43
  • 68
  • 3
    The first of your two remaining guarantees is guaranteed by C++ 2017 (draft n4659) 11.3.4, “Arrays” [dcl.array]: “An object of array type contains a contiguously allocated non-empty set of `N` subobjects of type `T`.” 1998 edition has identical text except with hyphenated “sub-objects” in 8.3.4. – Eric Postpischil Apr 12 '19 at 11:50
  • @EricPostpischil Thank you for clarification! What exactly does "contiguously allocated" mean in this context? – lisyarus Apr 12 '19 at 11:53
  • @lisyarus: It is “plain English,” or at least English as used by practitioners in the field—it is not formally defined in the standard. I am quite sure it means the bytes of the elements in the array are laid out in memory one after the other with no padding between elements. – Eric Postpischil Apr 12 '19 at 11:55
  • 5
    In C, the second of the remaining guarantees is not guaranteed, and there are some reasons a “difficult” C implementation might pad a structure containing a single array. For example, we can imagine an implementation would pad `struct { char x[2]; }` to four bytes if its target hardware had a strong bias toward four-byte word addressing of memory, and the implementation had decided to make all structures at least four-byte-aligned to satisfy the C standard’s requirement of one representation for all structure pointers. I expect C++ is similar but cannot speak confidently to it… – Eric Postpischil Apr 12 '19 at 11:59
  • 2
    … and note that is something of a “theoretical” possibility. Most likely, `struct { float data[16]; }` would not be given any trailing padding by any normal C or C++ implementation—there is no reason for it in any normal target platform. But, in the absence of an explicit specification in the C++ standard, the only way to guarantee it is for the project to require that any C++ implementation used to compile it satisfy this property. It could be tested with an assertion. – Eric Postpischil Apr 12 '19 at 12:00
  • The 1st one is obvious, but it doesn't help much if the 2nd one isn't guaranteed, and it's not easy to find any info in either way. The closest thing I've found is in [class.mem](http://eel.is/c++draft/class.mem#26). But it says about in-between data members, and the begining, not the end. Though, the `sizeof` asserts should pass for the array. The `sizeof` should include any padding at the end. – luk32 Apr 12 '19 at 12:01
  • @EricPostpischil Thank you. As I've said, the platforms & compilers I usually work with all have the expected memory layout for this example struct, so my question is more of a theoretical nature, too. – lisyarus Apr 12 '19 at 12:04
  • The trivial memory layout of a C array is implied by the way pointer arithmetic is done. – curiousguy Apr 14 '19 at 12:03

2 Answers2

13

One thing that is not guaranteed about the layout is endianness i.e. the order of bytes within a multi-byte object. write_bytes(&x, sizeof(A)) is not portable serialisation across systems with different endianness.

A can be reinterpret_cast to a pointer to its first data member (which is, presumably, data[0] ?)

Correction: The first data member is data, which you can reinterpret cast with. And crucially, an array is not pointer-interconvertible with its first element, so you cannot reinterpret cast between them. The address however is guaranteed to be the same, so reinterpreting as data[0] should be fine after std::launder as far as I understand.

There is no padding in between elements of an array of primitive type

Arrays are guaranteed to be contiguous. sizeof of an object is specified in terms of padding required to place elements into an array. sizeof(T[10]) has exactly the size sizeof(T) * 10. If there is padding between non-padding bits of adjacent elements, then that padding is at the end of the element itself.

Primitive type is not guaranteed to not have padding in general. For example, the x86 extended precision long double is 80 bits, padded to 128 bits.

char, signed char and unsigned char are guaranteed to not have padding bits. C standard (to which C++ delegates the specification in this case) guarantees that the fixed width intN_t and uintN_t aliases do not have padding bits. On systems where that is not possible, these fixed width types are not provided.

Björn Lindqvist
  • 19,221
  • 20
  • 87
  • 122
eerorika
  • 232,697
  • 12
  • 197
  • 326
  • Just to be absolutely clear. Is your last paragraph is a direct counter example against 2nd unanswered question? I am asking from a compound type perspective, so for example struct `S {char a,b,c;};` if padded to `4*sizeof(char)` could have padding at the end. And for that matter we cannot tell relative address of any member other than `a`, I think they can be reordered an padded as compiler see fit. Yup? – luk32 Apr 12 '19 at 12:20
  • 1
    @luk32 There cannot possibly be a *need* for padding between `char` elements, as they have alignment of 1. Any sensible ABI would place the padding (if there is any) of `S` at the end. But indeed, I don't know of explicit guarantee about that in the C++ standard. – eerorika Apr 12 '19 at 12:26
  • Could you, please, elaborate on this usage of `std::launder`? – lisyarus Apr 12 '19 at 15:31
  • @eerorika I apologize, but I struggle to understand the reason why `std::launder` is needed here based on the cppreference article. – lisyarus Apr 12 '19 at 16:18
  • @lisyarus A pointer to `A` can be reinterpret casted to a pointer to `float[16]` because the type of the first member (`data`) of the standard layout class `A` is `float[16]` http://eel.is/c++draft/basic.compound#4. If a pointer to `float[16]` were pointer-interconvertible with a pointer to `float` (`data[0]`), then a pointer to `A` would be transitively convertible to `float` ... – eerorika Apr 12 '19 at 16:40
  • ... But an array *is not* pointer-interconvertible with the first element, so the premise doesn't hold. `std::launder` should make the conversion work as far as I understand. There is an example in the cppreference page doing the conversion in the other direction (from first element to array type) using laundering with comment "OK". – eerorika Apr 12 '19 at 16:42
  • @eerorika Thank you for clarification. Guess I'll have to dig a bit more into `std::launder`. – lisyarus Apr 13 '19 at 11:41
  • 1
    @lisyarus In C *and* C++, a pointer doesn't point to a memory location, it points to a designated object: pointers are high level types, not low level addresses as in assembly. **Pointing to an object is not the same to pointing to a different object at the exact same address**. I have asked many Q related to pointers (nearly all were extremely badly received), calling that the "semantic value" of a pointer as opposed to the numeric value. Obv. if you cross comp. boundaries and call function compiled by a diff comp. only the numeric value (the state) of a pointer object matters (descr by ABI). – curiousguy Apr 14 '19 at 12:12
  • (...) The C++ pretends pointers are trivial types. There is no polite way to explain it away. **That's a lie.** Pointers can't be trivial type as the semantic value of a trivial type is a function of its bit pattern, period. If two object of a trivial type, neither of which is uninitialized, have the same bit pattern the have the same semantic value, period; it means any operation valid on one has the same validity on the other. If one trivial type ptr can be dereferenced then any other pointer object of the same type and bit pattern can be, and you get the same stuff. – curiousguy Apr 14 '19 at 12:21
  • (...) So if an out of bound ptr to array object has the same value as the ptr to another obj there, you should be able to dereference it. *That will not work in practice, compilers don't allow that.* **`int a[1], b[1]; a[1] = 2;` is not a legal way to access `b[0]` even when its address is the same as the one past the end `a+1` ptr.** Proving that ptr have a numeric value and a semantic value, and that the way they are obtained (their "origin") determines their semantic value. This is not a clear concept in most ppl's mind and I took awful flack by simply raising the Q here. – curiousguy Apr 14 '19 at 12:23
  • Many Q (other than mines, notably [Are pointer variables just integers with some operators or are they “symbolic”?](https://stackoverflow.com/q/32045888/963864) and [Dereferencing an out of bound pointer that contains the address of an object (array of array)](https://stackoverflow.com/q/32043795/963864) ) deal with the semantic value of a pointer [Is a pointer with the right address and type still always a valid pointer since C++17?](https://stackoverflow.com/q/48062346/963864) and [Pointer interconvertibility vs having the same address](https://stackoverflow.com/q/47924103/963864) – curiousguy Apr 14 '19 at 12:55
2

If a standard-layout class object has any non-static data members, its address is the same as the address of its first non-static data member. Otherwise, its address is the same as the address of its first base class subobject (if any). [Note: There might therefore be unnamed padding within a standard-layout struct object, but not at its beginning, as necessary to achieve appropriate alignment. — end note]

Hence, the standard guarantees that

static_assert(offsetof(A, data[0]) == 0 * sizeof(float));

An object of array type contains a contiguously allocated non-empty set of N subobjects of type T.

Hence, the following are true

static_assert(offsetof(A, data[0]) == 0 * sizeof(float));
static_assert(offsetof(A, data[1]) == 1 * sizeof(float));
...
static_assert(offsetof(A, data[15]) == 15 * sizeof(float));
Yashas
  • 1,154
  • 1
  • 12
  • 34