4

Suppose we have class template Wrapper like this:

template <class T>
struct Wrapper { T wrapped; };

For what types is it safe to reinterpret_cast between a Type and a Wrapper<Type>? None? Standard-layout? All?

Suppose we created an object of one of these (Type and Wrapper<Type>), and read and write this object through the other. Example (live on godbolt.org):

void F1() {
    std::stringstream ss;
    ss << "Hello";
    reinterpret_cast<Wrapper<std::stringstream>&>(ss).wrapped << " world";
}

void F2() {
    Wrapper<std::stringstream> ss;
    ss.wrapped << "Hello";
    reinterpret_cast<std::stringstream&>(ss) << " world";
}

Reading the comments of this answer this area seems to be not quite unequivocal in the standard. I think that all compilers would generate a code that works as expected (i.e. a value of one type can be cast to the other), but the standard may not currently guarantee this. If it doesn't, the question arises: Could the standard guarantee well defined behaviour with these casts, or is it not possible/impractical to guarantee anything in such a case?

Cause I am pretty sure, that these casts will actually work.

Dr. Gut
  • 2,053
  • 7
  • 26
  • "*Reading the comments of this answer this area seems to be not quite unequivocal in the standard.*" Reading those comments only tells me that there are some people who don't think C++ *ought* to be that way, not that C++ isn't actually that way. And many of the people in that comment thread are well known for their views in this matter, which are not congruent with how the C++ object model actually works. – Nicol Bolas Jul 07 '20 at 00:48
  • 1
    "*Could the standard guarantee well defined behaviour with these casts, or is it not possible/impractical to guarantee anything in such a case?*" This question makes no sense. The standard "could" guarantee anything. The C++ standard is not generally in the business of looking at what compilers consider "well defined behavior" and then saying "that's C++". The standard *defines* what is well defined, and if some other stuff happens to work for a particular implementation in a specific situation, that's none of the standard's business. – Nicol Bolas Jul 07 '20 at 00:50
  • @NicolBolas: However, if something not standardized works for all compilers, and has practical significance, it might be put in the standard, I think. – Dr. Gut Jul 07 '20 at 01:03
  • That's not generally why something goes into the standard. Especially when it pertains to fundamental concepts like the C++ object model. It ultimately has to be something that makes sense, and at least half of what you're talking about does not. – Nicol Bolas Jul 07 '20 at 01:12
  • @NicolBolas: The question I asked has practical application (I can assure you, that it makes sense), so I would appreciate if this behavior would be guaranteed by the standard. Or at least if it would be a reasonable assumption that every compiler compiles a working code from this forever in the future (i.e. the two types are interchangeable by `reinterpret_cast`). Your insights are valuable to me. Maybe you could try to answer this question. – Dr. Gut Jul 07 '20 at 01:40
  • 1
    "*I can assure you, that it makes sense*" No, it doesn't. If there's no `Wrapper` object there, then it makes no sense to try to access one. You can't just pretend there's an object there when there isn't one. To make this make sense, you would have to fundamentally change the very *idea* of what an "object" even means to C++. You'd basically be saying that an "object" is just how you're looking at a piece of memory at the moment, that it doesn't mean anything beyond that. – Nicol Bolas Jul 07 '20 at 01:44
  • @NicolBolas: The authors of the C Standard have expressly said that they did not wish to preclude the use of the language as a high-level assembler. Further, C++ was supposed to be a superset of C. Imposing a stronger "object" model on trivial types is needlessly semantically restrictive, and yet allows fewer useful optimizations than would adopting a "view" model where programmers have to ensure that a compiler has a chance to recognize places where storage that has been accessed as one type will temporarily be accessed using another. – supercat Jul 07 '20 at 21:04

1 Answers1

6

Reinterpreting T (that's not a member of Wrapper<T>) as Wrapper<T> is never allowed (the F1 example).

On the other hand, I believe reinterpreting Wrapper<T> as T is allowed for standard-layout classes: (the F2 example)

[basic.compound]/4.3

Two objects a and b are pointer-interconvertible if:

— one is a standard-layout class object and the other is the first non-static data member of that object, or ...

And right below that:

If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_­cast. ...

Note that while this rule is symmetrical, it requires both objects to actually exist. If you have a reference to T pointing to the member of Wrapper<T>, then you can reinterpret it as Wrapper<T> (and the other way around). But if it points to a T object that's not a member of Wrapper<T>, then it would be UB.


Disclaimer: By "such-and-such reinterpreting is not allowed" I mean that accessing the result of the reinterpret_cast would cause UB. The cast itself shouldn't cause UB.

HolyBlackCat
  • 78,603
  • 9
  • 131
  • 207
  • Is that a permanent link, or will it be dangling in some years? I believe http://eel.is/c++draft is the current draft standard at all times, so it might go dangling in a while. Apart from that suppose we have `struct A : B1, B2, B3 {};`. We clearly cannot `reinterpret_cast` between an `A` and a `B2`. So if two objects are _pointer-interconvertible_, that does not mean, we can use them like I did in the question. – Dr. Gut Jul 06 '20 at 23:16
  • 2
    @Dr.Gut The link is not permanent, but the section names (like `[foo.bar]`) are supposed to be stable, so you can look it up in any version of the standard. – HolyBlackCat Jul 06 '20 at 23:25
  • Right below that part it says *"If two objects are pointer-interconvertible, then they have the same address, and it is possible to obtain a pointer to one from a pointer to the other via a reinterpret_­cast."*. I'm not sure why it doesn't say that only the first base must have the same address as the object... – HolyBlackCat Jul 06 '20 at 23:26
  • note that [Denying the antecedent](https://en.wikipedia.org/wiki/Denying_the_antecedent) should not be applied to the second quote (e.g. it is valid to alias a standard layout object by character type, even though `char` and `T` are not inter-convertible) – M.M Jul 06 '20 at 23:48
  • @M.M Or at least it's [supposed](https://stackoverflow.com/questions/62329008/is-it-ub-to-access-a-member-by-casting-an-object-pointer-to-char-then-doing) to be that way. :( – HolyBlackCat Jul 06 '20 at 23:50
  • Thinking of our grandchildren I have changed `[basic.compound]/4.3` in the answer from [eel.is](http://eel.is/c++draft/basic.compound#4.3) to [timsong-cpp.github.io](https://timsong-cpp.github.io/cppwp/n4861/basic.compound#4.3). From the linked questions I think that it will be hard to answer my question. The specification is unclear in this area. And it has bugs as HolyBlackCat pointed out: only the first base (`B1`) can have the same address as the entire object (`A`). – Dr. Gut Jul 07 '20 at 00:09
  • 3
    @Dr.Gut Standard-layout classes can't have multiple bases , unless they're all empty (or one is non-empty and the derived class is empty) – M.M Jul 07 '20 at 00:21
  • There is also N4861:[`[basic.lval]/11`](https://timsong-cpp.github.io/cppwp/n4861/basic.lval#11), which was previously different, see N4659:[`[basic.lval]/8`](https://timsong-cpp.github.io/cppwp/n4659/basic.lval#8). If my usage not is well defined for all types, the question remains: could it be well defined for all types in the standard? – Dr. Gut Jul 07 '20 at 00:26
  • @NicolBolas Missed that part when skimming through the question... Edited. – HolyBlackCat Jul 07 '20 at 09:58
  • @Dr.Gut _There is also N4861:[basic.lval]/11, which was previously different, see N4659:[basic.lval]/8_ The meaning is the same, the removed rules could never be applied in C++. – Language Lawyer Jul 07 '20 at 10:36
  • @HolyBlackCat: Thanks for the edit. The quoted rule is symmetrical, so how can `reinterpret_cast` be valid in one direction and invalid in the other? I am missing something. – Dr. Gut Jul 07 '20 at 15:38
  • @Dr.Gut As Nicol Bolas said, both objects have to actually exist for this to work. If you do have a `Wrapper`, then you can `reinterpret_cast` it to (reference to) `T` and back. But if you only have `T` (not as a member of `Wrapper`), then you can't do it. – HolyBlackCat Jul 07 '20 at 15:41
  • @HolyBlackCat: So the reason is: if we create a `Wrapper` we created also a `T` at the same place. But if we create a `T`, there is no `Wrapper` created in C++'s object model. Can you include this in the answer? So that people do not have to open the comment section for understanding the asymmetry. – Dr. Gut Jul 07 '20 at 15:57
  • @HolyBlackCat: Thank you. – Dr. Gut Jul 07 '20 at 16:26
  • _But if it points to a T object that's not a member of Wrapper, then it would be UB_ Just reinterpreting won't be UB – Language Lawyer Jul 07 '20 at 17:52
  • @LanguageLawyer Yep, added a disclaimer. – HolyBlackCat Jul 07 '20 at 18:11
  • @Dr.Gut: If one uses `memcpy` to overwrite a region of allocated storage which had an object of type T1 in it, and the source pointer was the address of a T2 which was contained within, and completely filled, a T3, what type of object would exist at the destination? – supercat Jul 07 '20 at 21:26
  • @supercat: I suppose, that overwriting a `T1` with `memcpy` is illegal, whatever the source object may be. Even standard-layout classes can be non-trivially relocatable, see `short_string` in [P1144](http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1144r4.html#non-trivial-sample-string), [live demo](https://godbolt.org/z/od3Bed). Why are you asking? – Dr. Gut Jul 07 '20 at 22:38
  • @Dr.Gut: Sorry, I meant to specify that all types were trivial. If it's only going to be possible to access "objects" that "exist" within a region of storage, then there should be a clear specification as to what objects would exist when using `memcpy` on an object of trivial type, but so far as I can tell the Standard's abstraction model isn't clear about that. – supercat Jul 07 '20 at 22:48
  • @supercat: "*what type of object would exist at the destination?*" Whatever objects existed in that region of storage to begin with. There are certain pieces of C++ syntax that create objects in C++, and `memcpy` (pre-C++20 at least) is *not* one of them. Copying bytes does not cause the objects that exist around those bytes to magically appear. And yes, this is well-specified in C++; [intro.object]/1 spells out the circumstances under-which an object comes into existence. – Nicol Bolas Jul 09 '20 at 02:38
  • @NicolBolas: In C++99, such cases were expected to be handled as they were in C, and compilers have continued to process such code mostly as they would in C. If there were an intention to deprecate the use of `memcpy` for such purposes, the Standard should have introduced alternatives which could be used as replacements in all cases where the former would have been used, but so far as I can tell it has never done so. – supercat Jul 09 '20 at 13:12
  • @supercat: "*In C++99, such cases were expected to be handled as they were in C*" Were they? Can you point to a statement in C++98 that would define what that behavior would be? Can you point to a statement from the makers of C++98 that claim that they intended to make such things meaningful? Or are you just saying "compilers do the 'right thing', so the standard should too?" Because the last one is not a valid argument. C++ never was what you *thought* it was, and it's best to accept that. – Nicol Bolas Jul 09 '20 at 13:23
  • @NicolBolas: If a language was in wide use before a Standard was written (as was the case with both C and C++), and the Standard doesn't specify all constructs that are in wide use, then one must either interpret the Standard as defining a language which includes the common constructs, or as defining a new language which is incompatible with the pre-existing one. Some people pretend "do the right thing" is complicated, but it really isn't. In most cases, it would mean interpret actions upon objects whose representation is fully specified as actions upon the storage occupied thereby, ... – supercat Jul 09 '20 at 14:00
  • ...except in *particular* situations where there would be a good reason for doing otherwise. – supercat Jul 09 '20 at 14:00
  • @supercat: "*Some people pretend "do the right thing" is complicated, but it really isn't.*" Except that the thing you described *is* complicated, because it defines a system where objects are not objects; they're just views over memory. Except for all the times when they *are* objects. There's no notion that any particular operation is right or wrong even at runtime... except for those cases where they are. Drawing that line, deciding what it is and where it is and how it manifests, would require a lot of complexity. Even the whole "implicit object creation" of C++20 is complex. – Nicol Bolas Jul 09 '20 at 14:04
  • @supercat: Just because a person can write a thing in a sentence or two to get the idea across doesn't mean that the thing isn't *complicated*. Especially if it needs to be formally specified rather than just understood ad-hoc. "*as defining a new language which is incompatible with the pre-existing one*" Nobody ever claimed that C++98 was compatible with any "language" that preceded it, because there were no languages that preceded it. There were just a bunch of people doing vaguely similar things. – Nicol Bolas Jul 09 '20 at 14:05
  • @NicolBolas: The Standard never *requires* that objects whose representations are fully defined behave as anything other than views over memory; it merely specifies situations where they are *allowed* to do so. Prior to the C89 and C++99, there were certainly things that called themselves "C compilers" and "C++ compilers", and lots of source files which those compilers would process in useful fashion. If the Standards weren't intended to allow programmers to do the things they were doing with "C compilers" and "C++ compilers", why did they use the names "C" and "C++"? – supercat Jul 09 '20 at 14:29
  • @NicolBolas: What is complicated about the notion "process objects whose representation is fully defined in a fashion consistent with a 'views over memory' model, in all cases where doing so would likely be at least as useful as any alternative?" Any complicated situations can be handled simply by treating the object as a view over memory. – supercat Jul 09 '20 at 14:37
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/217542/discussion-between-nicol-bolas-and-supercat). – Nicol Bolas Jul 09 '20 at 14:43