19

TL;DR

Given the following code:

int* ptr;
*ptr = 0;

does *ptr require an lvalue-to-rvalue conversion of ptr before applying indirection?

The standard covers the topic of lvalue-to-rvalue in many places but does not seem to specify enough information to determine whether the * operator require such a conversion.

Details

The lvalue-to-rvalue conversion is covered in N3485 in section 4.1 Lvalue-to-rvalue conversion paragraph 1 and says (emphasis mine going forward):

A glvalue (3.10) of a non-function, non-array type T can be converted to a prvalue.53 If T is an incomplete type, a program that necessitates this conversion is ill-formed. If the object to which the glvalue refers is not an object of type T and is not an object of a type derived from T, or if the object is uninitialized, a program that necessitates this conversion has undefined behavior.[...]

So does *ptr = 0; necessitate this conversion?

If we go to section 4 paragraph 1 it says:

[...]A standard conversion sequence will be applied to an expression if necessary to convert it to a required destination type.

So when is it necessary? If we look at section 5 Expressions the lvalue-to-rvalue conversion is mentioned in paragraph 9 which says:

Whenever a glvalue expression appears as an operand of an operator that expects a prvalue for that operand, the lvalue-to-rvalue (4.1), array-to-pointer (4.2), or function-to-pointer (4.3) standard conversions are applied to convert the expression to a prvalue. [...]

and paragraph 11 which says:

In some contexts, an expression only appears for its side effects. Such an expression is called a discarded-value expression.[...] The lvalue-to-rvalue conversion (4.1) is applied if and only if the expression is an lvalue of volatile-qualified type and it is one of the following [...]

neither paragraph seems to apply to this code sample and 5.3.1 Unary operators paragraph 1 it says:

The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T,” the type of the result is “T.” [ Note: indirection through a pointer to an incomplete type (other than cv void) is valid. The lvalue thus obtained can be used in limited ways (to initialize a reference, for example); this lvalue must not be converted to a prvalue, see 4.1. —end note ]

it does not seem to require the value of the pointer and I don't see any requirements for a conversion of the pointer here am I missing something?

Why do we care?

I have seen an answer and comments in other questions that claim the use of an uninitialized pointer is undefined behavior due the need for an lvalue-to-rvalue conversion of ptr before applying indirection. For example: Where exactly does C++ standard say dereferencing an uninitialized pointer is undefined behavior? makes this argument and I can not reconcile the argument with what is laid out in any of the recent draft versions of the standard. Since I have seen this several times I wanted to get clarification.

The actual proof of undefined behavior is not as important since as I noted in the linked question above we have others way to get to undefined behavior.

Community
  • 1
  • 1
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • *"it does not seem to"* **explicitly** *"require the value of the pointer"*. Once could argue that the value is required *implicitly* (by "common sense"). – dyp Jan 10 '14 at 20:07
  • 1
    To be clear, your goal is to figure out exactly what part of the standard makes `int *p; *p=0;` undefined behavior? And failing that, spot a bug in the standard? – Yakk - Adam Nevraumont Jan 10 '14 at 20:21
  • @Yakk let me rephrase, the answer impacts whether you can use this argument to show using an uninitialized pointer is UB, but it is not the only argument, we can show it is UB [b/c we must assume it is singular](http://stackoverflow.com/questions/4285895/where-exactly-does-c-standard-say-dereferencing-an-uninitialized-pointer-is-un/20614158#20614158). I see three outcomes `1)` there is defect, the standard should explicitly say there is a l-to-r conversion `2)` no conversion is mandated and this is not a proof of UB `3)` the l-to-r conversion is implied and it does prove UB. – Shafik Yaghmour Jan 10 '14 at 22:06
  • @dyp but if it is implied does that mean any read of a variable requires an l-to-r conversion? If that is the case then why all the specific language in section `5`? – Shafik Yaghmour Jan 10 '14 at 22:08
  • 1
    @ShafikYaghmour I'm not sure if every read requires an l-to-r conversion. Jerry Coffin's interpretation gives UB w/o l-to-r, by violating the requirements. Similar questions: http://stackoverflow.com/q/14991219/420683 http://stackoverflow.com/q/14935722/420683 – dyp Jan 10 '14 at 22:14
  • @ShafikYaghmour I find the discussion in the comments to the question of the first link quite interesting. – dyp Jan 10 '14 at 22:24
  • @dyp I just read them and I realized I had not seen either before, they are definitely relevant and insightful but apparently not definitive, maybe I will feel different after reading them a few times more. – Shafik Yaghmour Jan 11 '14 at 03:21
  • @dyp I converted my update which was based on the links you provided to an answer since it seems it is indeed the answer since you seem to have a good understanding of the topic let me know if you feel like misinterpreted anything. Of course if you feel like you have a different conclusion from those threads then please add it as an answer. – Shafik Yaghmour Jan 19 '14 at 15:18

2 Answers2

13

I think you're approaching this from a rather oblique angle, so to speak. According to §5.3.1/1:

The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T,” the type of the result is “T.”

Although this doesn't talk about the lvalue-to-rvalue conversion, it requires that the expression be a pointer to an object or function. An uninitialized pointer won't (except, perhaps by accident) be any such thing so the attempt at dereferencing gives undefined behavior.

Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
  • Well the *shall* applies to the *type*, not to the value. – dyp Jan 10 '14 at 20:27
  • @dyp: Not so. It specifies "the expression", not "the type of the expression", therefore both the value *and* the type of the expression matter. The rest of the sentence: "referring to the object or function to which the expression points." clearly depends upon this--it becomes meaningless if the expression does not point to an object or function. – Jerry Coffin Jan 10 '14 at 20:28
  • I meant, it says it *"shall be a pointer to an object type"*, not *shall be a pointer to an object*. Then there's the part that says *"the result is an lvalue referring to the object [...] to which the expression points"*, where *implicitly*, the value is required, and *implicitly*, it's required that the pointer points to something. – dyp Jan 10 '14 at 20:30
  • 1
    I think we agree, but I'd like it better if it were a bit more explicit ;) – dyp Jan 10 '14 at 20:31
  • 2
    @dyp: I wouldn't object to its being more explicit, but I don't see any reasonable way to interpret it as not requiring that the expression refer to an object or function. – Jerry Coffin Jan 10 '14 at 20:36
  • In C99, there's a special rule that allows forming a null-pointer from a null-pointer by compacting `&*nullptr` to `nullptr` (6.5.3.2/3). Another pathological case is `int& i = *nullptr`. Both of them produce UB in C++ according to this interpretation. – dyp Jan 10 '14 at 20:41
  • @dyp: Yup. I don't think the former is something to which anybody would object, but I don't think the current C and C++ standards really agree on it. I believe `int &i = *nullptr;` gives UB, and I can't really imagine anybody wanting it to do otherwise. – Jerry Coffin Jan 10 '14 at 20:48
  • There might be another issue with objects before the beginning of their lifetime, but after their storage has been obtained. Do those count as objects? If I have a pointer to such an object, may I produce a reference to it? – dyp Jan 10 '14 at 20:49
  • @dyp: It doesn't count as an object until a constructor has successfully completed. – Jerry Coffin Jan 10 '14 at 20:59
  • @ShafikYaghmour: Yeah, the arguments are similar, but I think this one is a little more direct. – Jerry Coffin Jan 11 '14 at 03:28
  • @ShafikYaghmour you request *"more insightful way"* in your bounty. There can't be *"more insightful way"* than the one put by Jerry. It's as clear and trivial as that, nothing more to talk about here. PS: why are you messing this with *rvalue-to-lvalue* conversion, when the result is a clear *lvalue*? – Tomas Jan 18 '14 at 09:19
  • @Tomas that is what is being [argued here](http://stackoverflow.com/questions/4285895/where-exactly-does-c-standard-say-dereferencing-an-uninitialized-pointer-is-un/20614158#comment31591358_4286034). I did not see it which is what eventually got me to ask the question. I have seen that same claim made in other questions in comments but I unfortunately can not find the other cases. – Shafik Yaghmour Jan 18 '14 at 10:06
  • FYI, I re-worded the question to remove emphasis from the UB aspect of the question since that really was not the point. Although I attempted to point that out several times comments indicate that most did not get that. Considering how you choose to answer the question you may feel that is a significant change. – Shafik Yaghmour Mar 10 '14 at 17:34
4

I have converted the update section in my question to an answer since at this point it seems to be the answer, albeit an unsatisfactory one that my question is unanswerable:

dyp pointed me to two relevant threads that cover very similar ground:

The consensus seems to be that the standard is ill-specified and therefore can not provide the answer I am looking for, Joseph Mansfield posted a defect report on this lack of specification, and it looks like it is still open and it is not clear when it may be clarified.

There are a few common sense arguments to be made as to the intent of the standard. One can argue Logicially, an operand is a prvalue if the operation requires using the value of that operand. Another argument is that if we look back to the C99 draft standard says an lvalue to rvalue conversion is done by default and the exceptions are noted. The relevant section from the draft C99 standard is 6.3.2.1 Lvalues, arrays, and function designators paragraph 2 which says:

Except when it is the operand of the sizeof operator, the unary & operator, the ++ operator, the -- operator, or the left operand of the . operator or an assignment operator, an lvalue that does not have array type is converted to the value stored in the designated object (and is no longer an lvalue). […]

which basically says with some exceptions an operand is converted to the value stored and since indirection is not an exception if this is clarified to also be the case in C++ as well then it would indeed make the answer to my question yes.

As I attempted to clarify the proof of undefined behavior was less important than clarifying whether a lvalue-to-rvalue conversion is mandated. If we want to prove undefined behavior we have alternate approaches. Jerry’s approach is a common sense one and in that indirection requires that the expression be a pointer to an object or function and an indeterminate value will only by accident point to a valid object. In general the draft C++ standard does not give an explicit statement to say using an indeterminate value is undefined, unlike the C99 draft standard In C++11 and back the standard does not give an explicit statement to say using an indeterminate value is undefined. The exception being iterators and by extension pointers we do have the concept of singular value and we are told in section 24.2.1 that:

[…][ Example: After the declaration of an uninitialized pointer x (as with int* x;), x must always be assumed to have a singular value of a pointer. —end example ] […] Dereferenceable values are always non-singular.

and:

An invalid iterator is an iterator that may be singular.268

and footnote 268 says:

This definition applies to pointers, since pointers are iterators. The effect of dereferencing an iterator that has been invalidated is undefined.

In C++1y the language changes and we do have an explicit statement making the use of an intermediate value undefined with some narrow exceptions.

Community
  • 1
  • 1
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • Thanks for the update! *"In general the draft C++ standard does not give an explicit statement to say using an indeterminate value is undefined, unlike the C99 draft standard"* I think [CWG 1494](http://www.open-std.org/JTC1/SC22/WG21/docs/cwg_defects.html#1494) addresses that. sftrabbit has recently changed his SO name to Joseph Mansfield. Other than that, I think I don't have much to add. +1 – dyp Jan 19 '14 at 15:32
  • @dyp did you mean [616](http://www.open-std.org/JTC1/SC22/WG21/docs/cwg_defects.html#616)? – Shafik Yaghmour Jan 19 '14 at 17:57