7

Is this valid C++?

int main() {
    int *p;
    p = reinterpret_cast<int*>(42);
}

Assuming I never dereference p.

Looking up the C++ standard, we have

C++17 §6.9.2/3 [basic.compound]

3 Every value of pointer type is one of the following:

  • a pointer to an object or function (the pointer is said to point to the object or function), or
  • a pointer past the end of an object ([expr.add]), or
  • the null pointer value ([conv.ptr]) for that type, or
  • an invalid pointer value.

A value of a pointer type that is a pointer to or past the end of an object represents the address of the first byte in memory ([intro.memory]) occupied by the object or the first byte in memory after the end of the storage occupied by the object, respectively. [ Note: A pointer past the end of an object ([expr.add]) is not considered to point to an unrelated object of the object's type that might be located at that address. A pointer value becomes invalid when the storage it denotes reaches the end of its storage duration; see [basic.stc]. — end note ] For purposes of pointer arithmetic ([expr.add]) and comparison ([expr.rel], [expr.eq]), a pointer past the end of the last element of an array x of n elements is considered to be equivalent to a pointer to a hypothetical array element n of x and an object of type T that is not an array element is considered to belong to an array with one element of type T.

p = reinterpret_cast<int*>(42); does not fit into the list of possible values. And:

C++17 §8.2.10/5 [expr.reinterpret.cast]

A value of integral type or enumeration type can be explicitly converted to a pointer. A pointer converted to an integer of sufficient size (if any such exists on the implementation) and back to the same pointer type will have its original value; mappings between pointers and integers are otherwise implementation-defined. [ Note: Except as described in 6.7.4.3, the result of such a conversion will not be a safely-derived pointer value. — end note ]

C++ standard does not seem to say more about the integer to pointer conversion. Looking up the C17 standard:

C17 §6.3.2.3/5 (emphasis mine)

An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation.68)

and

C17 §6.2.6.1/5

Certain object representations need not represent a value of the object type. If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined. If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.

To me, it seems like any value that does not fit into the list in [basic.compound] is a trap representation, thus p = reinterpret_cast<int*>(42); is UB. Am I correct? Is there something else making p = reinterpret_cast<int*>(42); undefined?

curiousguy
  • 8,038
  • 2
  • 40
  • 58
Aykhan Hagverdili
  • 28,141
  • 6
  • 41
  • 93
  • The first quote is about the pointer **value**, not how you obtain it, so I don't think it's relevant here. `reinterpret_cast(42)` is (likely) an invalid pointer value, which fits the 4th bullet of your first quote. Also, *"A value of integral type or enumeration type can be explicitly converted to a pointer. "* - How does this not answer your question? – Holt Feb 10 '20 at 09:13
  • @Holt *"A pointer value becomes invalid when the storage it denotes reaches the end of its storage duration"* It is not a wild card for all possible values. – Aykhan Hagverdili Feb 10 '20 at 09:15
  • @Holt it maybe a trap representation. See the last two quotes in the question – Aykhan Hagverdili Feb 10 '20 at 09:17
  • The fact, that an integral value may be explicitly converted to a pointer does not mean that the pointer will ever point to a valid object. Though the assignment may be correct, any operation on that pointer (dereferencing, pointer arithmetic, ...) is UB then. But you may convert the pointer back to an integral value later. – Stephan Lechner Feb 10 '20 at 09:17
  • @StephanLechner the assignment evaluates the invalid pointer `p`. This seems to be UB to me. – Aykhan Hagverdili Feb 10 '20 at 09:18
  • @Ayxan Depends how you read that quote. This is a one way implication. It does not say that **all** invalid pointer values come from this. – Holt Feb 10 '20 at 09:19
  • What do you mean to say? As far as I can see, it's `an invalid pointer value`. The value `42` is not a valid address, hence the dereferencing is unsafe. – theWiseBro Feb 10 '20 at 09:19
  • @theWiseBro according to which part of the standard does it fit into that catagory? – Aykhan Hagverdili Feb 10 '20 at 09:21
  • @Ayxan _"the assignment evaluates the invalid pointer p. This seems to be UB to me."_ This is actually not true. When you write `int * p;` followed by `p = new int(42);` for example, this is perfectly valid. Dereferencing an uninitialized or invalid pointer is Undefined Behaviour indeed, but you didn't in your example. – Fareanor Feb 10 '20 at 09:22
  • 2
    I'd read C++17 §6.9.2/3 .. "invalid pointer value" as one of the four possible and allowed forms a pointer value may take on. So an "invalid pointer value" (e.g. pointing to an invalid object) is still defined behaviour. UB comes from those parts defining the meaning of operations on pointers (e.g. arithmetic, dereferencing). There is still one defined behaviour left on "invalid pointer values", which is "any pointer value may be converted to integral type", regardless of whether the pointer value is an "invalid pointer value" or not. – Stephan Lechner Feb 10 '20 at 09:23
  • 1
    @Ayxan If pointers could only become invalid when the object they point to reach the end of their storage duration, `int *p` would be UB, so I don't think the way you take that quote is right. – Holt Feb 10 '20 at 09:24
  • @StephanLechner what if it is a trap value? Where does the standard say *"if integer to pointer conversion does not evaluate to an actual address, it is an invalid value"*? – Aykhan Hagverdili Feb 10 '20 at 09:28
  • @Holt `int *p` does not evaluate `p`, unlike my example, which does. – Aykhan Hagverdili Feb 10 '20 at 09:30
  • @Ayxan Yes, but that's not the issue here. The first quote is not about evaluating pointer. `int *p` creates a pointer value. If this pointer value cannot be an invalid pointer value, per your last comment, what is it? – Holt Feb 10 '20 at 09:32
  • The example shown is valid c++. On some platforms this is how you access hardware resources. – darune Feb 10 '20 at 09:32
  • @Holt trap representation? – Aykhan Hagverdili Feb 10 '20 at 09:32
  • @darune According to [Wikipedia](https://en.wikipedia.org/wiki/Null_pointer) *"BIOS code written in C for 16-bit real-mode x86 devices may write the IDT at physical address 0 of the machine by dereferencing a null pointer for writing."*. That does not make dereferencing null pointer well defined. – Aykhan Hagverdili Feb 10 '20 at 09:34
  • 1
    BTW, `42` is probably misaligned value for `int*`. – Jarod42 Feb 10 '20 at 10:00
  • 1
    @Holt: it is UB: http://eel.is/c++draft/basic.indet#2 – geza Feb 10 '20 at 11:31
  • @geza Thanks, did not know that. But I don't think it invalidates the point that you can generate invalid pointer by value by other mean than the one mentioned above. – Holt Feb 10 '20 at 11:52
  • I'm confused. You're asking about C++ code, but then start citing from the C standard. The C++ standard does not mention "trap" in relation to pointer values. – 1201ProgramAlarm Feb 28 '20 at 23:47
  • @1201ProgramAlarm perhaps [a comment](https://stackoverflow.com/questions/59117564/is-one-past-end-pointer-ok-for-non-array-object-types#comment104465503_59117594) on a (deleted) answer explains what I mean more accurately. – Aykhan Hagverdili Feb 28 '20 at 23:59
  • Besides all the reinterpret cast talk, whenever you use 64bits, you cannot use int since this `int` is max 32bits, I think. I for one, use long long. – dejoma Mar 06 '20 at 13:07
  • @dejoma "int is max 32bits" Where does the standard mention this constraint? – Aykhan Hagverdili Mar 06 '20 at 15:07
  • @Ayxan I am just saying that if you're not using a C++ type that has size 64bit it might be wrong.. Hence my use of long long – dejoma Mar 09 '20 at 10:05

3 Answers3

5

This is not UB, but implementation-defined, and you already cited why (§8.2.10/5 [expr.reinterpret.cast]). If a pointer has invalid pointer value, it doesn't necessarily mean that it has a trap representation. It can have a trap representation, and the compiler must document this. All you have here is a not safely-derived pointer.

Note, that we generate pointers with invalid pointer value all the time: if an object is freed by delete, all the pointers which pointed to this object have invalid pointer value.

Using the resulting pointer is implementation defined as well (not UB):

[...] if the object to which the glvalue refers contains an invalid pointer value ([basic.stc.dynamic.deallocation], [basic.stc.dynamic.safety]), the behavior is implementation-defined.

geza
  • 28,403
  • 6
  • 61
  • 135
  • freed or deleted memeory is explicitly in the "invalid value" category as per the standard: *"A pointer value becomes invalid when the storage it denotes reaches the end of its storage duration;"*. Where does the standard say "a pointer that doesn't point anywhere has an invalid value"? It doesn't say that. I interpret the wordings I cited as any value that falls outside of the list of possible values is by definition a trap value. – Aykhan Hagverdili Feb 10 '20 at 10:42
  • A pointer can only have those 4 different values. If it is not the first 3, then it has invalid pointer value (in case of well-formed program). The "invalid pointer value" category contains trap values as well. For example, it can happen on certain platforms, that after `delete`, you cannot access the pointer itself, because it will generate an exception. There is a footnote for this: http://eel.is/c++draft/basic.stc#footnote-32. – geza Feb 10 '20 at 10:49
  • "If it is not the first 3, then it has invalid pointer value (in case of well-formed program)." Can you cite wordings from the standard saying that? – Aykhan Hagverdili Feb 10 '20 at 10:54
  • *"Any other use of an invalid pointer value has implementation-defined behavior."* We are effectively evaluating and copying `(int*)42` in my example, so this is at least implementation defined? – Aykhan Hagverdili Feb 10 '20 at 10:56
  • @Ayxan: You already cited it: "Every value of pointer type is one of the following". There are no exceptions (or a 5th kind of type) mentioned anywhere. – geza Feb 10 '20 at 10:56
  • it could be a trap representation. – Aykhan Hagverdili Feb 10 '20 at 11:02
  • 1
    @Ayxan: which is not a separate kind of its own. If it were, the standard would have put it in that list. It is simple set theory: the set of "invalid pointer value" contains the set "pointers with trap representation" (that footnote supports this argument as well). – geza Feb 10 '20 at 11:12
  • a trap value by definition is a kind of bit layout that doesn't represent any sane object of that type. It would be incorrect to include trap value in the list of possible values a pointer object can hold. – Aykhan Hagverdili Feb 10 '20 at 11:52
  • 1
    @Ayxan Because you get UB the moment you assign the pointer variable. The pointer variable cannot hold the trap representation that may come out of the cast. – cmaster - reinstate monica Feb 10 '20 at 11:56
2

The example shown is valid c++. On some platforms this is how you access "hardware resources" (and if it's not valid you have found a bug/mistake in standard text).

See also this answer for a better explanation.


Update: The first sentence of reinterpret_cast as you quote yourself:

A value of integral type or enumeration type can be explicitly converted to a pointer.

I recommend you stop reading and rest yourself at this point. The rest of just a lot details including possible implementation specified behavior, etc. That doesn't make it UB/invalid.

curiousguy
  • 8,038
  • 2
  • 40
  • 58
darune
  • 10,480
  • 2
  • 24
  • 62
  • According to [Wikipedia](https://en.wikipedia.org/wiki/Null_pointer) *"BIOS code written in C for 16-bit real-mode x86 devices may write the IDT at physical address 0 of the machine by dereferencing a null pointer for writing."*. That does not make dereferencing null pointer well defined. – Aykhan Hagverdili Feb 10 '20 at 09:35
  • @Ayxan I consider that a special case. AFAIK there are workarounds for reading address 0 validly even in c++ (maybe platform dependent though). – darune Feb 10 '20 at 09:38
  • Since this question is tagged with "language-lawyer" it would be appropriate to quote the relevant parts of the standard. – Aykhan Hagverdili Feb 10 '20 at 09:43
  • 2
    The access to hardware resources works because it's implementation defined behavior, only. – cmaster - reinstate monica Feb 10 '20 at 09:43
  • @cmaster-reinstatemonica and thus is valid c++. you cannot have invalid c++ that is implementation defined (unless using non-conforming compiler ofc.). Think about what you suggest. – darune Feb 10 '20 at 10:02
  • @Ayxan true, but it's somewhat uninteresting - so I thought I would save everyones time... I mean, I can dream up "language-lawyer" questions from now til sunday if I wanted to... I understand the concern if it is real, but I do not think a "language lawyer" response is required for an answer (I admit ofc. that would be a better answer, yes) – darune Feb 10 '20 at 10:07
  • 2
    The sentence "if it's not valid you have found a bug/mistake in standard text" is wrong because C++ is defined by the C++ standard (which has defects, of course, but not in this case) rather than "some platforms". Providing a wrong answer to a serious [language-lawyer] question is not an effective way to "save everyones time". – L. F. Feb 10 '20 at 10:34
  • @L.F. interesting. So you say there cannot be 'bugs' in the C++ language text ? why am I saying that ? because this has been valid for the past 20+ years and if that somehow doesn't hold anymore it has to be 'defect' (I would say mistake) as you call it introduced in new writing. – darune Feb 10 '20 at 13:13
  • @darune There are many bugs (defects) in the C++ standard, but only text against the intent of the standard is considered defective. The C++ standard is created to foster portable code that can run among all implementations, while permitting many practices that abound in legacy code bases but rely on implementation-specific details as non-standard (implementation-defined or undefined). `reinterpret_cast(42)` has always been (and been intended to be) implementation-defined since the start of the standardization of C++ and this is not simply "a bug/mistake in standard text". – L. F. Feb 10 '20 at 13:35
  • @L.F. exactly my point - undefined behavior and unspecified/implementation-defined are miles apart. The OP is asking if it's illegal/UB – darune Feb 10 '20 at 13:50
  • 1
    The question uses four question marks (one in the title, three in the question body). Three of them refer to UB; the remaining one says "valid", so I'm pretty sure that "valid" refers to UB-free as well (which is the common terminology anyway) ... – L. F. Feb 10 '20 at 13:55
  • @cmaster-reinstatemonica I think "Assigning this invalid pointer to a pointer variable is undefined behavior" is from the C17 quote, right? Can you show the equivalent from C++17? – L. F. Feb 10 '20 at 14:07
  • @L.F. Right, there seems to be a difference between the two standards. I'm deleting my last comment. – cmaster - reinstate monica Feb 10 '20 at 14:13
0

Trap Representations

What: As covered by [C17 §6.2.6.1/5], a trap representation is a non-value. It is a bit pattern that fills the space allocated for an object of a given type, but this pattern does not correspond to a value of that type. It is a special pattern that can be recognized for the purpose of triggering behavior defined by the implementation. That is, the behavior is not covered by the standard, which means it falls under the banner of "undefined behavior". The standard sets out the possibilities for when a trap could be (not must be) triggered, but it makes no attempt to limit what a trap might do. For more information, see A: trap representation.

The undefined behavior associated with a trap representation is interesting in that an implementation has to check for it. The more common cases of undefined behavior were left undefined so that implementations do not need to check for them. The need to check for trap representations is a good reason to want few trap representations in an efficient implementation.

Who: The decision of which bit patterns (if any) constitute trap representations falls to the implementation. The standards do not force the existence of trap representations; when trap representations are mentioned, the wording is permissive, as in "might be", as opposed to demanding, as in "shall be". Trap representations are allowed, not required. In fact, N2091 came to the conclusion that trap representations are largely unused in practice, leading up to a proposal to remove them from the C standard. (It also proposes a backup plan if removal proves infeasible: explicitly call out that implementations must document which representations are trap representations, as there is no other way to know for sure whether or not a given bit pattern is a trap representation.)

Why: Theoretically, a trap representation could be used as a debugging aid. For example, an implementation could declare that 0xDDDD is a trap representation for pointer types, then choose to initialize all otherwise uninitialized pointers to this bit pattern. Reading this bit pattern could trigger a trap that alerts the programmer to the use of an uninitialized pointer. (Without the trap, a crash might not occur until later, complicating the debugging process. Sometimes early detection is the key.) In any event, a trap representation requires a trap of some sort to serve a purpose. An implementation would not define a trap representation without also defining its trap.

My point is that trap representations must be specified. They are deliberately removed from the set of values of a given type. They are not simply "everything else".

Pointer Values

C++17 §6.9.2/3 [basic.compound]

This section defines what an invalid pointer value is. It states "Every value of pointer type is one of the following" before listing four possibilities. That means that if you have a pointer value, then it is one of the four possibilities. The first three are fully specified (pointer to object or function, pointer past the end, and null pointer). The last possibility (invalid pointer value) is not fully specified elsewhere, so it becomes the catch-all "everything else" entry in the list (it is a "wild card", to borrow terminology from the comments). Hence this section defines "invalid pointer value" to mean a pointer value that does not point to something, does not point to the end of something, and is not null. If you have a pointer value that does not fit one of those three categories, then it is invalid.

In particular, if we agree that reinterpret_cast<int*>(42) does not point to something, does not point to the end of something, and is not null, then we must conclude that it is an invalid pointer value. (Admittedly, one could assume that the result of the cast is a trap representation for pointers in some implementation. In that case, yes, it does not fit into the list of possible pointer values because it would not be a pointer value, hence it's a trap representation. However, that is circular logic. Furthermore, based upon N2091, few implementations define any trap representations for pointers, so the assumption is likely groundless.)

[ Note: [...] A pointer value becomes invalid when the storage it denotes reaches the end of its storage duration; see [basic.stc]. — end note ]

I should first acknowledge that this is a note. It explains and clarifies without adding new substance. One should expect no definitions in a note.

This note gives an example of an invalid pointer value. It clarifies that a pointer can (perhaps surprisingly) change from "points to an object" to "invalid pointer value" without changing its value. Looking at this from a formal logic perspective, this note is an implication: "if [something] then [invalid pointer]". Viewing this as a definition of "invalid pointer" is a fallacy; it is merely an example of one of the ways one can get an invalid pointer.

Casting

C++17 §8.2.10/5 [expr.reinterpret.cast]

A value of integral type or enumeration type can be explicitly converted to a pointer.

This explicitly permits reinterpret_cast<int*>(42). Therefore, the behavior is defined.

To be thorough, one should make sure there is nothing in the standard that makes 42 "erroneous data" to the degree that undefined behavior results from the cast. The rest of [§8.2.10/5] does not do this, and:

C++ standard does not seem to say more about the integer to pointer conversion.

Is this valid C++?

Yes.

JaMiT
  • 14,422
  • 4
  • 15
  • 31
  • "`int * p = new int; p += 2;` produces a non-null pointer that does not point to an object and does not point past the end of an object." No, this snippet has UB (it is UB to add `2` to a pointer, which points to a single object). – geza Mar 08 '20 at 01:31
  • @geza I guess I let my understanding of what actually happens cloud what the standard says. :( The example could still be made valid, but it would take too much build-up to be worth it. Example removed until I get more inspired. – JaMiT Mar 08 '20 at 04:29