11

Here's an example:

#include <cstddef>
#include <iostream>

struct A
{
    char padding[7];
    int x;
};
constexpr int offset = offsetof(A, x);

int main()
{
    A a;
    a.x = 42;
    char *ptr = (char *)&a;
    std::cout << *(int *)(ptr + offset) << '\n'; // Well-defined or not?
}

I always assumed that it's well-defined (otherwise what would be the point of offsetof), but wasn't sure.

Recently I was told that it's in fact UB, so I want to figure it out once and for all.

Does the example above cause UB or not? If you modify the class to not be standard-layout, does it affect the result?

And if it's UB, are there any workarounds for it (e.g. applying std::launder)?


This entire topic seems to be moot and underspecified.

Here's some information I was able to find:

HolyBlackCat
  • 78,603
  • 9
  • 131
  • 207
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/215784/discussion-on-question-by-holyblackcat-is-it-ub-to-access-a-member-by-casting-an). – Samuel Liew Jun 12 '20 at 01:24
  • Plz stop referring to CWG1314 as an indisputable permission to examine the representation of objects, its NAD status was called into question by CWG1701. – Language Lawyer Jul 01 '20 at 14:33
  • @LanguageLawyer I assume you refer to my comments [here](https://stackoverflow.com/questions/62676422/is-reinterpret-castcharmytypeptr-assumed-to-point-to-an-array#comment110838543_62676422)... I don't think CWG1314 alone is an indisputable permission, but in combination with what M.M pointed out, I *think* it's allowed. – HolyBlackCat Jul 01 '20 at 15:46

2 Answers2

6

Here I will refer to C++20 (draft) wording, because one relevant editorial issue was fixed between C++17 and C++20 and also it is possible to refer to specific sentences in HTML version of the C++20 draft, but otherwise there is nothing new in comparison to C++17.

At first, definitions of pointer values [basic.compound]/3:

Every value of pointer type is one of the following:
— a pointer to an object or function (the pointer is said to point to the object or function), or
— a pointer past the end of an object ([expr.add]), or
— the null pointer value for that type, or
— an invalid pointer value.

Now, lets see what happens in the (char *)&a expression.

Let me not prove that a is an lvalue denoting the object of type A, and I will say «the object a» to refer to this object.

The meaning of the &a subexpression is covered in [expr.unary.op]/(3.2):

if the operand is an lvalue of type T, the resulting expression is a prvalue of type “pointer to T” whose result is a pointer to the designated object

So, &a is a prvalue of type A* with the value «pointer to (the object) a».

Now, the cast in (char *)&a is equivalent to reinterpret_cast<char*>(&a), which is defined as static_cast<char*>(static_cast<void*>(&a)) ([expr.reinterpret.cast]/7).

Cast to void* doesn't change the pointer value ([conv.ptr]/2):

A prvalue of type “pointer to cv T”, where T is an object type, can be converted to a prvalue of type “pointer to cv void”. The pointer value ([basic.compound]) is unchanged by this conversion.

i.e. it is still «pointer to (the object) a».

[expr.static.cast]/13 covers the outer static_cast<char*>(...):

A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T”, where T is an object type and cv2 is the same cv-qualification as, or greater cv-qualification than, cv1. If the original pointer value represents the address A of a byte in memory and A does not satisfy the alignment requirement of T, then the resulting pointer value is unspecified. Otherwise, if the original pointer value points to an object a, and there is an object b of type T (ignoring cv-qualification) that is pointer-interconvertible with a, the result is a pointer to b. Otherwise, the pointer value is unchanged by the conversion.

There is no object of type char which is pointer-interconvertible with the object a ([basic.compound]/4):

Two objects a and b are pointer-interconvertible if:
— they are the same object, or
— one is a union object and the other is a non-static data member of that object ([class.union]), or
— one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, any base class subobject of that object ([class.mem]), or
— there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.

which means that the static_cast<char*>(...) doesn't change the pointer value and it is the same as in its operand, namely: «pointer to a».

So, (char *)&a is a prvalue of type char* whose value is «pointer to a». This value is stored into char* ptr variable. Then, when you try to do pointer arithmetic with such a value, namely ptr + offset, you step into [expr.add]/6:

For addition or subtraction, if the expressions P or Q have type “pointer to cv T”, where T and the array element type are not similar, the behavior is undefined.

For the purposes of pointer arithmetic, the object a is considered to be an element of an array A[1] ([basic.compound]/3), so the array element type is A, the type of the pointer expression P is «pointer to char», char and A are not similar types (see [conv.qual]/2), so the behavior is undefined.

Language Lawyer
  • 3,378
  • 1
  • 12
  • 29
  • Hmm, but what about [`[basic.types]/2`](http://eel.is/c++draft/basic.types#2)? Do you think it means that `memcpy` and `memmove` are magical and can't be implemented in standard C++? – HolyBlackCat Jun 12 '20 at 09:20
  • 3
    @HolyBlackCat _`memcpy` and `memmove` are magical and can't be implemented in standard C++?_ Yes. – Language Lawyer Jun 12 '20 at 09:27
  • You seem to be technically correct (the best kind of correct), however this is so ridiculous that I refuse to believe that it was intended (for reasons outlined by M.M and some other ones), and doubt that it can cause problems on any decent compiler. – HolyBlackCat Jun 12 '20 at 10:41
  • 1
    @LanguageLawyer Although the standard says , for any object ,`The object representation of an object of type T is the sequence of N unsigned char objects taken up by the object of type T`, So, It's amazing why the standard have not permitted use `unsigned char*` for pointer arithmetic for any object type. In order to we can implement such a `memcpy` subject to the standard. – xmh0511 Jul 02 '20 at 08:36
  • @HolyBlackCat You are right that this is not the intended behavior. It is just the behavior as it is currently defined. There is a [defect report](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1839r2.pdf) intended to reconcile these differences. It will allow access to the underlying representation bytes and allow the behavior of `memcpy` and similar functions to be replicated in standard c++. But it doesn't seem like it would allow access to `x` using `offsetof` as this question is asking. – François Andrieux Mar 25 '21 at 17:09
  • @FrançoisAndrieux _But it doesn't seem like it would allow access to x using offsetof as this question is asking_ Just need `launder`, no? Or you mean as the question is *literally* asking? – Language Lawyer Mar 25 '21 at 17:16
  • @LanguageLawyer It depends on the final wording of `reinterpret_cast`. The new allowances the report's proposed wording introduce allows converting a pointer to an object into a pointer to the first element of the memory representation, and then opposite conversion. Whether or not it will be possible to convert pointers to other bytes of representations into pointers to the member that occupies the same address is up to speculation. You would need this operation in order to get a pointer which could be passed to `launder` (though you might not have). My guess is that they won't, but who knows. – François Andrieux Mar 25 '21 at 17:26
3

This question, and the other one about launder, both seem to me to boil down to interpretation of the last sentence of C++17 [expr.static.cast]/13, which covers what happens for static_cast<T *> applied to an operand of pointer to unrelated type which is correctly aligned:

A prvalue of type “pointer to cv1 void” can be converted to a prvalue of type “pointer to cv2 T ”,

[...]

Otherwise, the pointer value is unchanged by the conversion.

Some posters appear to take this to mean that the result of the cast cannot point to an object of type T, and consequently that reinterpret_cast with pointers or references can only be used on pointer-interconvertible types.

But I don't see that as justified, and (this is a reductio ad absurdum argument) that position would also imply:

  • The resolution to CWG1314 is overturned.
  • Inspecting any byte of a standard-layout object is not possible (since casting to unsigned char * or whatever character type supposedly cannot be used to access that byte).
  • The strict aliasing rule would be redundant since the only way to actually achieve such aliasing is to use such casts.
  • There would be no normative text to justify the note "[Note: Converting a prvalue of type “pointer to T1 ” to the type “pointer to T2 ” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value. —end note ]"
  • offsetof is useless (and the C++17 changes to it were therefore redundant)

It seems like a more sensible interpretation to me that this sentence means the result of the cast points to the same byte in memory as the operand. (As opposed to pointing to some other byte, as can happen for some pointer casts not covered by this sentence). Saying "the value is unchanged" does not mean "the type is unchanged", e.g. we describe conversion from int to long as preserving the value.


Also, I guess this may be controversial to some but I am taking as axiomatic that if a pointer's value is the address of an object, then the pointer points to that object, unless the Standard specifically excludes the case.

This is consistent with the text of [basic.compound]/3 which says the converse, i.e. that if a pointer points to an object, then its value is the address of the object.

There doesn't seem to be any other explicit statement defining when a pointer can or cannot be said to point to an object, but basic.compound/3 says that all pointers must be one of four cases (points to an object, points past the end, null, invalid).

Examples of excluded cases include:

  • The use case of std::launder specifically addresses a situation where there was such language ruling out the use of the un-laundered pointer.
  • A past-the-end pointer does not point to an object. (basic.compound/3)
M.M
  • 138,810
  • 21
  • 208
  • 365
  • _It seems like a more sensible interpretation to me that this sentence means the result of the cast points to the same byte in memory as the operand_ This is a trivial implication from «the pointer value is unchanged». – Language Lawyer Jun 12 '20 at 17:13
  • _Some posters appear to take this to mean that the result of the cast cannot point to an object of type T, and consequently that reinterpret_cast with pointers or references can only be used on pointer-interconvertible types_ I don't see a connection between the first and the second part of the sentence. The fact that for non-interconvertible objects the value is unchanged doesn't forbid usage of `reinterpret_cast`. – Language Lawyer Jun 12 '20 at 17:15
  • @LanguageLawyer "This is a trivial implication" - I agree , but in language lawyer topics sometimes what seems trivial to one is contested by another – M.M Jun 12 '20 at 22:54
  • _that position would also imply: The resolution to CWG1314 is overturned_ You clearly know that it is overturned by CWG1701 – Language Lawyer Jul 01 '20 at 14:38
  • _There would be no normative text to justify the note_ [There shall be no Note then](https://github.com/cplusplus/draft/pull/4080) – Language Lawyer Jul 13 '20 at 01:19
  • The fix (not the removal) for the Note has been merged into the draft. Thanks for spotting the defect. – Language Lawyer Jul 16 '20 at 15:23
  • Three contradition-free interpretations would be that the authors of the Standard expected that in situations where the Standard simultaneously defines the behavior of a construct and characterizes it as UB, that it intended either (1) the definition has priority, (2) the "undefinition" has priority, or (3) implementations should on a quality-of-implementation give priority to the definition when practical, since the authors of the Standard didn't want to try to predict all of the cases where such treatment might or might not be practical. – supercat Aug 28 '23 at 18:29