0

Lets look at the following code:

int i = 10;
char c = reinterpret_cast<char&>(i);

[expr.reinterpret.cast]/11:

A glvalue expression of type T1 can be cast to the type “reference to T2” if an expression of type “pointer to T1” can be explicitly converted to the type “pointer to T2” using a reinterpret_­cast. The result refers to the same object as the source glvalue, but with the specified type.

So the reinterpret_cast<char&>(i) lvalue with the specified char type refers to the int object i.

To initialize c, we need value, so the lvalue-to-rvalue conversion is applied [conv.lval]/3.4:

the value contained in the object indicated by the glvalue is the prvalue result.

The result of the L2R conversion is the value contained in the i object. As long as the value of i is in the range representable by char ([expr]/4 says that otherwise this is UB), the variable c shall be initialized to have the same value.

From the implementation POV, on a little-endian platform this is easily achievable by reading a byte at the address of i object. However, on a big-endian platform the compiler will have to add an offset to fetch the least significant byte. Or, read the whole int object into a register and mask the first byte, which is acceptable approach on both endians.

If you think that the code above could be easily handled by a compiler to produce a code behaving as required by the C++17 Standard, think of casting a pointer to int pointing to i into a pointer to char. Such cast does not change the pointer value, i.e. it still points to the int object i, which means that applying the indirection operator to such pointer with the following L2R conversion shall behave as it was described above, i.e. fetch the value of the int object if it is representable by the char type.

In the following code

int i = 10;
f(reinterpret_cast<char*>(&i)); // void f(char*)

should the compiler adjust the address of i by some offset, if it does not know what the function f will do with its argument? And also the compiler does not know what will be passed to the function f. The code above and the function f are in different translation units.

For example, if f dereferences the pointer to read the value through it, it shall get the value of the i, as described above. But it also can be called with a pointer to a real char object, so f can't adjust the given pointer. This means that the caller shall adjust the pointer. But what if f passes the pointer to memcpy to copy sizeof(int) bytes to a character array of this size and back to another int object, as permitted by [basic.types]/3? It is not easy to imagine how to adjust pointers here to mach the required (by both [basic.types]/3 and [conv.lval]/3.4) behavior.

So, what existing implementations do, if there are existing implementations really conforming to the C++17 standard?

Language Lawyer
  • 3,378
  • 1
  • 12
  • 29
  • 3
    Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/186622/discussion-on-question-by-language-lawyer-is-c17-implementable-on-big-endian-p). –  Jan 13 '19 at 15:14
  • `reinterpret_cast(p)` provides access to the *object representation* of `*p`. This is not the same as the integral value of `*p`. Try and apply your reasoning to `double d = any_int(); reinterpret_cast(&d)` and you'll realize that your interpretation of these rules doesn't even work on a little-endian system. – Ben Voigt Jan 13 '19 at 19:49
  • 1
    @BenVoigt hi! We are discussing it in the chat. But anyway, `reinterpret_cast(p)` does not change the value of the operand unless there is object of type `T` [pointer-interconvertible](https://timsong-cpp.github.io/cppwp/n4659/basic.compound#def:pointer-interconvertible) with the object to which the argument points to. – Language Lawyer Jan 13 '19 at 19:55
  • @LanguageLawyer: `reinterpret_cast(p)` can do anything the hell it wants if `T` has alignment restrictions and `p` doesn't meet them. A common outcome is that it generates a pointer near `p` but with some of the address bits forced to zero. That's definitely "changing the value of the operand". Naturally, `char` has no alignment restrictions, so none of that is important here. – Ben Voigt Jan 13 '19 at 19:58
  • 1
    @BenVoigt the value of a pointer is never an address. See here for the list of possible pointer values https://timsong-cpp.github.io/cppwp/n4659/basic.compound#3 – Language Lawyer Jan 13 '19 at 20:08
  • @LanguageLawyer: Your own link says it's an address, right here: "A value of a pointer type that is a pointer to ... an object represents the address of the first byte in memory occupied by the object" – Ben Voigt Jan 13 '19 at 20:11
  • 1
    @BenVoigt a pointer to an object represents the address of the first byte occupied by the object. But the **value** of the pointer is "[pointer to](https://timsong-cpp.github.io/cppwp/n4659/basic.compound#def:pointer_to) the object" – Language Lawyer Jan 13 '19 at 20:14
  • 1
    @LanguageLawyer: Are we really going to look at this [at a Lewis Carroll level](https://en.wikipedia.org/wiki/Haddocks%27_Eyes)? But no, the pointer doesn't represent the address, the value of the pointer does. Right there in the phrase you linked and I quoted. "pointer to the object" is not the value of the pointer, it is the meaning of the value of the pointer. The value of the pointer is (in some unspecified representation) the address of the first byte of the object. – Ben Voigt Jan 13 '19 at 20:18
  • 1
    @BenVoigt _the pointer doesn't represent the address, the value of the pointer does_ Ok, I was not accurate enough. _"pointer to the object" is not the value of the pointer_ The Standard disagrees here. «Every value of pointer type is one of the following: a _pointer to_ an object or function (the pointer is said to _point to_ the object or function)». _the meaning of the value_ I don't think this is a meaningful phrase. The value is already the meaning, in some sense. The meaning of the value representation. – Language Lawyer Jan 13 '19 at 20:24
  • 1
    "one of the following" -> the things that follow are categories. "Dumbo is an elephant" is a statement of set membership, not a statement of identity. "The value of a pointer is ... a pointer to an object" is also a statement of set membership, not a statement of identity. The value *is* (identity) the address of the first byte of the object it points to. The value *is a member of the category* pointers to objects. – Ben Voigt Jan 13 '19 at 20:27
  • @BenVoigt I'm not a native speaker, but understand that "pointer to an objects" represents the family of values, one value per object. But anyway, address is not a value of a pointer. See, for example, https://github.com/cplusplus/draft/pull/2319 – Language Lawyer Jan 13 '19 at 21:13
  • 1
    @LanguageLawyer: The value of a pointer consists of more than an address on systems with *strict pointer safety*. On systems with *relaxed pointer safety*, an address is sufficient to form a pointer (a relaxed pointer safety implementation could theoretically still retain information about the derivation of the pointer value, but that information doesn't affect validity, it would only be a debugging aid). In particular, making a roundtrip through `intptr_t` is permitted even on implementations with strict pointer safety, so any non-address portion of the value must fit in `intptr_t`. – Ben Voigt Jan 13 '19 at 22:27
  • @BenVoigt _strict/relaxed pointer safety_ is a term applicable only to pointers to dynamically allocated objects (and derived from them). In this question there are no pointers to dynamically allocated objects. Address never really was a pointer value. Neither in C (google "pointer provenance") nor in C++. – Language Lawyer Jan 14 '19 at 13:37
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/186662/discussion-between-language-lawyer-and-ben-voigt). – Language Lawyer Jan 14 '19 at 15:06

3 Answers3

8

Edit: Complete rewrite: You've convinced me that the standard is broken.

... The result refers to the same object as the source glvalue, but with the specified type.


the value contained in the object indicated by the glvalue is the prvalue result.

I agree that literal interpretation may lead to the conclusions that you've made.

Given your interpretation, reinterpret_cast (and anything defined in terms of reinterpret_cast) becomes useless, and implementation is impossible not only on BE systems, but on LE systems as well (consider reinterpretation between non-integral types and char). As such, I don't believe are the intended meaning. This may be considered as a candidate for a defect report.

The confusion may be due to insufficiently accurate definition for expressions "value contained in", "object indicated" and "The result refers to the same object". Clarifying or rewording some or all of these may be in order.

Community
  • 1
  • 1
eerorika
  • 232,697
  • 12
  • 197
  • 326
  • 1
    I was trying to describe what a compiler shall do to match the behavior required by the standard. – Language Lawyer Jan 13 '19 at 13:53
  • @LanguageLawyer If you don't adjust the pointer, then you get the byte at the lowest memory address of the object. If you want the least significant byte, then reinterpretation is the wrong approach. – eerorika Jan 13 '19 at 14:17
  • @LanguageLawyer BE platforms would do the same as LE platforms. In both cases, the reinterpreted pointer points to the same address as the original. Endianness affects which byte is at that address. Standard doesn't require that byte to be same across platforms. – eerorika Jan 13 '19 at 14:25
  • 5
    @LanguageLawyer Getting 6 downvotes is frustrating, but getting passive-aggressive as a result of that won't help you getting an answer. :/ – HolyBlackCat Jan 13 '19 at 14:51
  • 1
    @LanguageLawyer I'd assume good intentions of the answerer. Regardless, replying freindly and explaining what the answer lacks in your opinion increases chances of them fixing the answer to suit your needs. – HolyBlackCat Jan 13 '19 at 18:45
  • @eerorika I suggest reading the commends (moved to the chat) under the question. OP is convinced that `reinterpret_cast(i)` should convert `i` to `char` while preserving the value of `i` if it's representable as `char` (more or less as if by `static_cast`). That's not right, but we were unable to find anything in the standard that actually proves that `reinterpret_cast(i)` returns the first byte in the representation of `i`. – HolyBlackCat Jan 13 '19 at 18:50
  • _There is no guarantee for an object to have the same value when its type is different_ Its (object's) type is not different. The type of the glvalue after `reinterpret_cast` is different here. And L2R conversion cares only about the object to which the glvalue refers when it fetches its value. Ofc, the type of the glvalue also does matter. It determines the type of the resulting prvalue. BTW, aliasing with `signed char` is UB. That's why I'm using `char` in the question. – Language Lawyer Jan 13 '19 at 19:34
  • @HolyBlackCat: Actually unions (just the existence of the rules for them, whether or not the object in question is actually a union member) do a great job of requiring that `reinterpret_cast(p)` points to the beginning of the object at `*p`. – Ben Voigt Jan 13 '19 at 20:05
  • @BenVoigt Now I'm curious. Can you point me to the revelant standard section? – HolyBlackCat Jan 13 '19 at 20:50
  • @HolyBlackCat: The [section on pointer-interconvertibility](https://timsong-cpp.github.io/cppwp/n4659/basic.compound#def:pointer-interconvertible) specifies that:1) union is pointer-interconvertible with each of its members and 2) pointer-interconvertibility is transitive. The first provides that if there is a union containing an `int` and a `char`, the `char` overlaps the first byte of the `int`. The second guarantees that direct `reinterpret_cast` works this way without going through a union along the way. – Ben Voigt Jan 13 '19 at 22:16
  • 2
    @BenVoigt in theory, it's the objects that are pointer-interconvertible, not the types. In practice, though, it can only make a difference on platforms with type-tagged pointer representations (and even then unlikely). – Kit. Jan 14 '19 at 11:00
  • @Kit. See my questions re: pointers notably [Are pointer variables just integers with some operators or are they “symbolic”?](https://stackoverflow.com/q/32045888/963864) and [Overwriting an object with an object of same type](https://stackoverflow.com/q/32043314/963864) show that pointers aren't just a state (unlike real trivial types), they have a symbolic value. – curiousguy Jan 14 '19 at 18:58
  • @curiousguy I don't know what you mean by "pointers have a symbolic value", but the address space itself has a nontrivial structure, the pointer arithmetic is not necessarily linear (x86 segment:offset model, for example) and in some architectures pointers may have trap representations (x86, if I remember correctly, has/had this "feature" in protected mode). Also, a compiler is allowed not to reserve address space for an object whose pointer is not taken. – Kit. Jan 14 '19 at 20:02
  • @Kit. That some bit patterns are illegal as pointer representation is trivially true and irrelevant. Trivial types can have such property. How the pointer address is encoded is only relevant if you try to read and interpret that bit pattern yourself. I have never suggested doing that. My many questions re: pointers don't address how a value is represented as a bit pattern but **whether a specific bit pattern for a pointer object implies a specific point value**. For trivial types, it does. For pointers where are allegedly trivial, it doesn't. C++ is split. – curiousguy Jan 14 '19 at 20:31
  • @curiousguy I see no basis for you claim that "it doesn't". Check [intro.object]/[8](https://timsong-cpp.github.io/cppwp/n4659/intro.object#8) and the comment to it. – Kit. Jan 14 '19 at 21:06
  • @Kit. Do you claim that a specific numeric address corresponds at most one pointer value? – curiousguy Jan 14 '19 at 21:34
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/186676/discussion-between-kit-and-curiousguy). – Kit. Jan 14 '19 at 21:44
5

[intro.object]/1 says that for non-polymorphic objects, "the interpretation of the values found therein is determined by the type of the expressions (Clause [expr]) used to access them."

(emphasis is of the standard itself, not mine)

As you have noticed, the type of such expression in your case is char, so the compiler does not need to interpret this value as a value of some object of type int.

Kit.
  • 2,386
  • 1
  • 12
  • 14
  • The sentence you've cited is not clear and IMO its semantics is void. What is "the interpretation of the values"? I would understand "the interpretation of the value representation", but not this. – Language Lawyer Jan 13 '19 at 15:50
  • That's, I suspect, because you are thinking of "value _of_ an object", while the standard clearly operates with "value _in_ an object". – Kit. Jan 13 '19 at 16:02
  • Do you think that https://timsong-cpp.github.io/cppwp/n4659/intro.execution#6 disallows placement new over local variables? Or explicit destructor call for them? – Language Lawyer Jan 13 '19 at 16:05
  • 2
    I think that [https://timsong-cpp.github.io/cppwp/n4659/basic.life#5](https://timsong-cpp.github.io/cppwp/n4659/basic.life#5) says that it applies to _any_ object. There could be an UB hidden somewhere in the standard if the original type has a non-trivial destructor _and_ you don't recreate the object of the original type before leaving the block, but I'm too lazy to search for it in this topic. – Kit. Jan 13 '19 at 16:31
  • The standard indeed requires you to re-create an object. But this does not matter now. You obviously can end the lifetime of an object with automatic storage duration, but [intor.execution]/6 says that every such object exists until the exit from the block. The moral? The Standard contains enough meaningless legacy and IMO what you've cited is one of such things. – Language Lawyer Jan 13 '19 at 16:35
  • 1
    I don't see what your call "meaningless legacy" here. It's not like _the situation_ was any different in any previous version of the standard. In fact, it was the same even _before_ the not yet standardized C++ got its placement new operator (and you would use assignment to `this` in the constructor for the similar functionality). The moral? You are reading the standard wrong. – Kit. Jan 13 '19 at 16:46
  • [intro.object]/1 is the most lunatic clause of the entire std IMNSHO. – curiousguy Jan 13 '19 at 16:51
  • Significant parts of the C++ standard was at first copy-pasted from the C standard. This is what I call legacy. Similar paragraph exists in C11 (and I'm sure it was in C89, just lazy to search for the standard) http://port70.net/~nsz/c/c11/n1570.html#6.2.4p6. The sentence you've cited looks like a legacy from the good ol' "portable assembler" days when there was no strict aliasing and every storage could be reinterpreted by an expression as storing a value of its type. – Language Lawyer Jan 13 '19 at 16:55
  • C++ _is_ a portable assembler with a very powerful (Turing-complete) macro language, and will stay this way or will die, because that's what its niche is. If you don't like it, use another language. – Kit. Jan 13 '19 at 17:02
  • 1
    C is not a portable assember since at least 1989. Neither C++ is. If you don't like it, use another language. – Language Lawyer Jan 14 '19 at 15:27
  • @curiousguy you don't like that "An object is created ..." part? I think there is no better way to "define" objects. They are basic entities in the C++. One can't define them using some more basic entities, because there are no such entities. Think of sets in ZFC. It does not define what set is. It only postulates existence of some sets (empty and \mathbb{N}) and describes ways to build new sets from existing ones. – Language Lawyer Jan 14 '19 at 15:36
  • @LanguageLawyer What's not to dislike? Are you claiming that breaking any non trivial C program was intentional? If so, can you trace its history? Please cite D&E. The comparison with ZFC is funny. How many times ZFC was patched because something obvious was forgotten? Or maybe you believe that lack of support for unions in C++ for more than a decade was the goal of the C++ committee. – curiousguy Jan 14 '19 at 15:48
  • 1
    @curiousguy If C programs are not broken by C++, what is http://wg21.link/p0593 needed for? – Language Lawyer Jan 14 '19 at 15:55
  • 1
    C++ is not ZFC. A C++ program is supposed to be able to interact with the rest of the world. Besides, C++ is not a logically constructed language, but an ugly hack, and its standards fully reflect that. – Kit. Jan 14 '19 at 16:02
  • @Kit. you can't have both «object is a region of storage» and separation between object lifetime and storage duration at the same time. So the C++ standard committee dropped the first statement, because it has never been true. – Language Lawyer Jan 14 '19 at 16:06
  • @LanguageLawyer C++ is a portable assembler. That's how the industry uses it, me including. If one tries to make a compiler assuming the opposite, it won't be widely used. An object and a region of storage are different notions, because _some_ objects have non-vacuous constructors and _some_ regions of storage introduce their content (the objects with vacuous constructors) into the program in the ways not specified by the language. The language is not supposed to know what `mmap()` does. Or are you going to allege that all C++ programs using `mmap()` are ill-formed? The industry won't agree. – Kit. Jan 14 '19 at 16:23
  • 1
    _C++ is a portable assembler. That's how the industry uses it_ Do you understand that the former does not follow from the later? _An object and a region of storage are different notions_ really? [C++14](https://timsong-cpp.github.io/cppwp/n4140/intro.object#def:object) and earlier standards say it is... _Or are you going to allege that all C++ programs using mmap() are ill-formed? The industry won't agree_ Man. Just read the description of the `language-lawyer` tag. There don't have to be agreement between the spec and practice. Your appeal to practice is ridiculous. – Language Lawyer Jan 14 '19 at 16:31
  • 1
    @Kit. _Or are you going to allege that all C++ programs using mmap() are ill-formed?_ "ill-formed" is not the same as "contains UB". Well-formed program can haz UB. "ill-formed" is roughly "not compilable" (except for "ill-formed; no diagnostic required" case). – Language Lawyer Jan 14 '19 at 16:47
  • @LanguageLawyer Are you saying that the `language-lawyer` tag [doesn't actually belong to Stack Overflow](https://stackoverflow.com/help/on-topic)? – Kit. Jan 14 '19 at 16:50
  • @Kit. no I'm saying that "There don't have to be agreement between the spec and practice". And the tag description kinda agrees that there could be gaps. – Language Lawyer Jan 14 '19 at 16:53
  • @LanguageLawyer so, you are _not_ saying that the practice has no say at all in how to resolve the specification's ambiguities, contradictions and other deficiencies? – Kit. Jan 14 '19 at 17:01
  • 1
    @Kit. I'm saying that stating that something is not UB because it is widely used in practice (when the spec clearly says it is UB) is not an argument. – Language Lawyer Jan 14 '19 at 17:11
  • Let us [continue this discussion in chat](https://chat.stackoverflow.com/rooms/186668/discussion-between-kit-and-language-lawyer). – Kit. Jan 14 '19 at 17:16
  • https://github.com/cplusplus/draft/commit/34d0392e2f32ea19aebab4919a525a3a9679594f#diff-496151aa14703466fa29b067d1498c86d5bd37c1c9a3f169569974a83cce021bL3030-L3032 – Language Lawyer Apr 17 '21 at 11:16
-1

However, on a big-endian platform the compiler will have to add an offset to fetch the least significant byte. Or, read the whole int object into a register and mask the first byte, which is acceptable approach on both endians.

That would only be true if you passed the value of i around. However, with reinterpret_cast<char&>, what you passed around is the address, because a reference is but a different syntax for a pointer. Therefore, c will get the value of the MSB of i, which is 0 as long as sizeof(int) > 1.

What you wrote was indistinguishable from:

int i = 10;
char c = *reinterpret_cast<char*>(&i);
KevinZ
  • 3,036
  • 1
  • 18
  • 26
  • A reference is an alias, like another "name" for an adressable lvalue. It isn't an address. – curiousguy Jan 18 '19 at 02:46
  • @curiousguy lvalues are addresses with an alternate syntax. If they actually represent the value at the address, then you wouldn't be able to change that value via assignment. – KevinZ Jan 18 '19 at 04:44
  • 1
    They clearly aren't. A bitfield member is a lvalue and has no distinct address. And in C you couldn't take the address of a `register` variable, although that restriction was removed a long time ago in C++. – curiousguy Jan 18 '19 at 05:56
  • 1
    The bitfield member is really a compiler-generated overload of the `=` oprator. `register` is a syntax sugar/salt that no modern compiler even bothers to respect. Think of it this way: when you pass an lvalue around from variable to variable, what actually goes on in the background, if the compiler cannot optimize it away, is the passing of the address, not of the value at the address. – KevinZ Jan 18 '19 at 20:56