4

Aside from that the ints in the following example might not be layouted as if they werre following in a normal array: is this an illegal aliasing in C++ ?

struct S
{
    int a, b;
};

void fn( S &s )
{
    (&s.b)[-1] = 123;
}
Enlico
  • 23,259
  • 6
  • 48
  • 102
  • 9
    it is illegal but not aliasing. – molbdnilo Sep 26 '22 at 05:44
  • 4
    Yes, it's illegal. Single objects are treated as arrays of size 1 for indexing purposes, and you're going out of bounds. – HolyBlackCat Sep 26 '22 at 05:45
  • 1
    I'd argue with that. This can easily be a language lawyer thing, for `*(&s.b - 1)` is legal in this case and, by definition, `a[b] == *((a) + (b))`. The reason behind, `S` is a POD type (thus being StandardLayoutType), where layout is explicitly specified by standard. Therefore, `&s.b - &s.a == 1`. – lorro Sep 26 '22 at 07:15
  • 1
    _"Is this an illegal aliasing?"_ Where's the aliasing? – mada Sep 26 '22 at 07:40
  • It is illegal. It's not a safely derived pointer. Negative indexes `i` are defined behaviour (and only defined behaviour) from a pointer to element j>=i of some array of the pointer type (singletons are arrays of size 1). On all know platforms the naïve result of the pointer arithmetic will be as expected but aggressive optimisation may disregard the result as 'undefined behaviour'. Also it does not obey 'strict pointer safety'. (I've never heard of a platform that would pad that `struct` but it's not unequivocally ruled out by the standard. – Persixty Sep 26 '22 at 09:36

1 Answers1

2

It is illegal (undefined behavior, see @user17732522's comment and this question pointed out by @Özgür Murat Sağdıçoğlu for why exactly).

Even for POD (plain old data) types, compilers are allowed to include padding between members. In the case of

struct S
{
    int a, b;
};

there is very likely no padding, as both members have the same alignment requirements (I could however find no reference, if that is required by the standard).

In another case like

struct S
{
    char a;
    int b;
};

there will be an implementation defined amount of padding between the members and pointer magic, like you did in your question will be non-portable at least.

Generally speaking the standard gives you the following guarantees about the memory layout of POD types [1]:

  • You can safely convert a pointer to the first member to the whole struct and vica versa.
  • You can use the offsetof macro [2] to get the offsets of the different members.

Thus when working with pointers to data members, try to use those facilities and avoid relying on false assumptions about the data layout.

Jakob Stark
  • 3,346
  • 6
  • 22
  • "When a class or struct is both trivial and standard-layout, it is a POD (Plain Old Data) type. The memory layout of POD types is therefore contiguous and each member has a higher address than the member that was declared before it, so that byte for byte copies and binary I/O can be performed on these types." - https://learn.microsoft.com/en-us/cpp/cpp/trivial-standard-layout-and-pod-types?view=msvc-170 – lorro Sep 26 '22 at 08:25
  • 2
    The behavior is undefined, not implementation-defined. `&s.b` is not a pointer to an array element at index `1` or higher, so adding `-1` to it as such has undefined behavior according to the pointer arithmetic rules. Whether compilers make any use of that in practice is a different matter. – user17732522 Sep 26 '22 at 08:25
  • @user17732522 no, `a[b]` is defined as `*((a) + (b))` and we have rules do deduce `*((a) + -1)` as it's a POD type. – lorro Sep 26 '22 at 08:26
  • 2
    @lorro There is no such rule. The standard is quite clear that pointer arithmetic is only allowed in arrays (and other objects considered as arrays of size 1): https://eel.is/c++draft/expr.add#4. This is the same in C as well and independent of whether the type is POD or not. – user17732522 Sep 26 '22 at 08:28
  • @user17732522 The longer I think about it, the surer I get that it is actually UB. I will change my phrasing. Thanks for the hint. – Jakob Stark Sep 26 '22 at 08:31
  • With the expansive `constexpr` rules we have now, it is also possible to simply ask the compiler: https://godbolt.org/z/eWxsY8ba7 Clang points out what is wrong here (undefined behavior means that the expression is not a constant expression). MSVC produces a similar message. Only GCC is not doing it properly (it tends to not diagnose UB in constant expressions as well as the other two.) – user17732522 Sep 26 '22 at 08:37
  • @user17732522 thanks for the link to the standard. I wonder now, what you actually are supposed to do with the results of the `offsetof` macro. Is there a well defined non-trivial use case besides just printing the result of it? – Jakob Stark Sep 26 '22 at 08:40
  • 2
    @JakobStark If you have e.g. a `unsigned char` pointer to the object representation of the structure you can add the offset, cast back to the member type and `std::launder` the pointer. That would give you a pointer in a defined way. (Technically that is/was also not well-defined, but is addresses by a current proposal as a defect which I think might have already been accepted.) It is however impossible to get from a member (except the first one) to any other member. The rules seem to specifically be written to not allow that although I have yet to see a compiler using that for optimization. – user17732522 Sep 26 '22 at 08:46
  • @user17732522 can't you use `offsetof` both ways? member1 -> superobject -> member2 – Caleth Sep 26 '22 at 08:47
  • 1
    @Caleth Only the first member (of a standard-layout class) is pointer-interconvertible with the class object. So simply `reinterpret_cast` doesn't work. And the preconditions on `std::launder` are specifically so that it is impossible to make memory that wouldn't be accessible through the original pointer accessible. I have been wondering myself whether this is a defect in the standard or whether it is intentional that it is impossible to cast back to the class object. Practice seems to point to the former, but the way it is specifically written to the latter. – user17732522 Sep 26 '22 at 08:49
  • @Caleth apperently not. The standard passage, that @user17732522 linked, states in a note: "Adding a value other than 0 or 1 to a pointer to a base class subobject, a member subobject, or a complete object results in undefined behavior." This also includes values returned by `offsetof`. – Jakob Stark Sep 26 '22 at 08:51
  • @Caleth And the paper https://open-std.org/jtc1/sc22/wg21/docs/papers/2022/p1839r5.pdf which I hinted at and which properly defines access to the object representation was also written in such a way that it wouldn't be allowed (at least in the last revision I read). – user17732522 Sep 26 '22 at 08:53
  • @JakobStark you are adding to a `char*` that points into an array at that point. @user17732522 is saying you go out-of-bounds in the member->super step – Caleth Sep 26 '22 at 08:56