5

Suppose we have two types, that have the same representation (the same member variables and base classes, in the same order). Is it valid (i.e. not UB) to reinterpret_cast between them? E.g. is it valid to reinterpret_cast from Mary to Ashley&? And what if the two types are polymorphic?

struct Mary {
    int  m1;
    char m2;
};

struct Ashley {
    int  a1;
    char a2;
};

int TryTwins ()
{
    Mary mary = {};

    Ashley& ashley = reinterpret_cast<Ashley&> (mary);
    ashley.a1 = 1;
    ashley.a2 = 2;

    return mary.m1 + mary.m2;
}

What if we cast the beginning of an object to another type, if we know that the source type starts with the member variables of the target type? E.g. is this valid (i.e. not UB)?

struct Locomotive {
    int    engine;
    char   pantograph;
};

struct Train {
    int    engine;
    char   pantograph;
    int*   wagon1;
    int**  wagon2;
    int*** wagon3;
};

int TryTrain ()
{
    Train train = {};

    Locomotive& loc = reinterpret_cast<Locomotive&> (train);
    loc.engine     = 1;
    loc.pantograph = 2;

    return train.engine + train.pantograph;
}

Note that all major compilers treat these as a valid casts (live demo). The question is, whether the C++ language allows this.

curiousguy
  • 8,038
  • 2
  • 40
  • 58
Dr. Gut
  • 2,053
  • 7
  • 26
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/201570/discussion-on-question-by-dr-gut-can-you-reinterpret-cast-between-types-which-h). – Samuel Liew Oct 29 '19 at 22:12

1 Answers1

2

[expr.reinterpret.cast]/11:

A glvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_­cast. The result refers to the same object as the source glvalue, but with the specified type. [...]

Mary and Ashley are object types, so pointers thereto can convert to each other. Now, we get use a lvalue of type Ashley to access the underlying Mary object.

[basic.lval]/8:

If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:

  • the dynamic type of the object,

  • a cv-qualified version of the dynamic type of the object,

  • a type similar to the dynamic type of the object,

  • a type that is the signed or unsigned type corresponding to the dynamic type of the object,

  • a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,

  • an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),

  • a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,

  • a char, unsigned char, or std​::​byte type.

None of these covers the case in question. ("Similar" talks about cv-qualification.) Therefore, undefined behavior.

L. F.
  • 19,445
  • 8
  • 48
  • 82
  • 3
    I agree, although it can be argued that `ashley.a1` is an lvalue of type `int` which is being used to access the object `mary.m1` which is an object of type `int`. Apparently it is a common interpretation that the expression `ashley.a1` accesses the stored value of all of the entire object of which `ashley.a1` is a member, although the standard does not clearly spell that out – M.M Oct 29 '19 at 02:45
  • What do you think about [the current state](http://eel.is/c++draft/basic.lval#11) of the [basic.lval]/8? – Language Lawyer Oct 29 '19 at 02:58
  • @M.M _Apparently it is a common interpretation that the expression `ashley.a1` accesses the stored value of all of the entire object_ You shoulda written "it is a common **mis**interpretation". – Language Lawyer Oct 29 '19 at 02:59
  • 1
    @LanguageLawyer Well, major compiler vendors take that stance so can we really say that it is a misinterpretation? The standard is meant to codify practice that the community agrees on ; it could equally be argued that the standard is vague and/or defective. – M.M Oct 29 '19 at 03:25
  • @M.M if only [expr.ref] were saying that in `E1.E2`, `E1` shall denote an object with a type compatible with the type of `E1`, then one couldn't abuse [basic.lval] to misinterpret compilers' behavior... Fortunately, [the resolution of CWG2051](http://wg21.link/p1359#2051) made it much harder to abuse strict aliasing rules. – Language Lawyer Oct 29 '19 at 07:07
  • @LanguageLawyer that resolution's suggested modification seems to suggest that OP's first sample should now be well-defined; although perhaps it is still unclear whether that was the CWG's intent – M.M Oct 29 '19 at 09:51
  • @M.M [basic.lval] never was the problem, because there never was an access through a class-typed glvalue in C++. I believe it is a defect that [expr.ref] misses the requirement on `E1`. It is the key to make OP example undefined. – Language Lawyer Oct 29 '19 at 09:53
  • What does OP mean? – Dr. Gut Oct 29 '19 at 18:08
  • 1
    @M.M: What if I access `mary.m1` (which is an `int`) through an `int&`, which is initialized to `ashley.a1`? [Live](https://godbolt.org/z/amdXBS). This way the expression `ashley.a1` is not used to access memory. Its address is used only. Does this change anything? – Dr. Gut Oct 29 '19 at 18:26
  • 1
    @M.M: The C community has never agreed on a set of rules, in part because it would be impossible to specify a single set of rules which wouldn't either make some kinds of tasks needlessly difficult, or make implementations that aren't intended for such tasks needlessly inefficient. – supercat Oct 29 '19 at 21:42
  • @LanguageLawyer: Given that resolution, given `struct foo { int x[10]; } a,b; void test(int index) { int *p = a.x; p[index]=1; b=a;}`. why wouldn't the last assignment invoke UB, given that it is using lvalues which are not derived from anything of type `int` to access the `int` value stored via `p[index]`? – supercat Oct 29 '19 at 21:53
  • @supercat I see [your comment](https://stackoverflow.com/questions/56878519/what-happened-to-the-aggregate-or-union-type-that-includes-one-of-the-aforement#comment100426959_56878519) under [the question](https://stackoverflow.com/questions/56878519/what-happened-to-the-aggregate-or-union-type-that-includes-one-of-the-aforement) where this was explained... – Language Lawyer Oct 30 '19 at 00:48
  • @Dr.Gut See the [N-Z section](https://meta.stackexchange.com/a/256420) of the Stack Exchange Glossary - Dictionary of Commonly-Used Terms. – L. F. Oct 30 '19 at 11:19
  • @LanguageLawyer: I'd forgotten the answer to that question, in part because as I indicated in a comment to that answer, it implied that types which have all of the same fields at the same offsets should behave in compatible fashion, but that's not how gcc and clang actually behave. – supercat Oct 30 '19 at 22:50
  • @LanguageLawyer "_because there never was an access through a class-typed glvalue in C++._" Somehow that text about aggregates was written in the original C++ std and kept during all these revisions... what gives? – curiousguy Dec 03 '19 at 23:20
  • @supercat "_why wouldn't the last assignment_" in C or C++? It's very important. – curiousguy Dec 03 '19 at 23:30
  • 1
    @curiousguy It guess it was copy-pasted from the C std and hasn't been reviewed until recently. Thera tons of BS in the C++ std which require huge rewriting. – Language Lawyer Dec 04 '19 at 07:44
  • 1
    @curiousguy: I was talking about the C Standard there, but I think my point is that the C Standard made no effort whatsoever to avoid characterizing as UB constructs whose meaning should be obvious and non-controversial, and the C++ patches a few holes, blindly inherits some aspects of the C Standard in ways that assume it's complete despite the fact that it was never intended to be. – supercat Dec 04 '19 at 15:06