23

Consider:

int* ptr = (int*)0xDEADBEEF;
cout << (void*)&*ptr;

How illegal is the *, given that it's used in conjunction with an immediate & and given that there are no overloaded op&/op* in play?


(This has particular ramifications for addressing a past-the-end array element &myArray[n], an expression which is explicitly equivalent to &*(myArray+n). This Q&A addresses the wider case but I don't feel that it ever really satisfied the above question.)

Community
  • 1
  • 1
Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 5.2.1/1 and footnote 62 in conjunction do seem to imply that the sub-expression `*ptr` should be taken on its own merits, before the `&` is applied, but that's no direct indication. – Lightness Races in Orbit Sep 08 '11 at 10:39
  • 1
    A good compiler should convert the effect of `&*ptr` to `ptr`; so practically *ok*. But theoretically it can be an *undefined behavior*. – iammilind Sep 08 '11 at 10:40
  • 1
    @iammilind: A good compiler OR a non-conforming compiler? If its *undefined behavior*, then how can it be good? – Nawaz Sep 08 '11 at 10:41
  • (By "illegal", of course I mean "undefined" and friends.) – Lightness Races in Orbit Sep 08 '11 at 10:41
  • @iammilind: If it's definitely UB, then only a **bad** compiler pretends that `&*ptr` is equivalent to `ptr` for all `ptr`! – Lightness Races in Orbit Sep 08 '11 at 10:42
  • 3
    I'm pretty sure we've had this argument before, and there seems to be some difference of opinion what the current standard implies even among those who've contributed to DRs and the like on the subject. The committee declined to introduce any form of non-existent lvalues to explicitly allow such tricks (so conservatively: it isn't allowed since lvalue expressions refer to an object, or some such wording, whereas `*ptr` is not an object), but no lvalue-to-rvalue conversion takes place on the non-existent referand (so optimistically: it'll work). – Steve Jessop Sep 08 '11 at 10:42
  • 1
    @Steve: Ah, I now remember the discussion on non-existent `lvalue`s; IIRC it was a conversation regarding `T* o = 0; o->static_member();`. That comment would probably make a good answer, incidentally... – Lightness Races in Orbit Sep 08 '11 at 10:44
  • @iammilind: Not complaining is completely irrelevant. UB is very very very rarely diagnosed. – Lightness Races in Orbit Sep 08 '11 at 10:44
  • @iammilind: It doesn't matter which compiler you consider good. If it's not following the language specification, then it's not good at least then. – Nawaz Sep 08 '11 at 10:45
  • 6
    I remember a comment by Stephan T Lavavej in one of his standard library tutorial videos from Channel 9 that saying `&myArray[n]` for an array of `n` elements is illegal, and you should instead be doing `&myArray[0] + n`. In other words, it is OK to walk a pointer beyond the end of an array, but it is illegal to dereference the pointer once it is beyond the bounds of the array. – Praetorian Sep 08 '11 at 10:46
  • @Praetorian: Strictly speaking, walking the pointer beyond the single past-the-end slot would be undefined too! – Lightness Races in Orbit Sep 08 '11 at 10:46
  • @Tomalak Geret'kal Why is that? What's special about the one slot immediately past the end of an array? – Praetorian Sep 08 '11 at 10:48
  • @Tomalak: Why would that be illegal? The pointer is just a value, you can do with it whatever you like... just not dereference it, non? – Kerrek SB Sep 08 '11 at 10:49
  • 2
    @Praetorian (and Kerrek): it's undefined behavior to even form the pointer value beyond the "one-past-the-end" address. For rationale, consider a hypothetical implementation where incrementing that address results in a trap representation of the pointer type, or overflows resulting in a hardware exception. Addresses aren't numbers in the C++ standard, even if they are in all known implementations: to the standard unallocated address space is a yawning void of chaos. – Steve Jessop Sep 08 '11 at 10:50
  • 1
    @iammilind: Undefined behaviour does not require the code to "not work". Having the code "work" is valid undefined behaviour. – R. Martinho Fernandes Sep 08 '11 at 10:52
  • 1
    @Martinho Fernandes: I didn't understand why you directed your comment at me. I did **not** say that *Undefined behaviour requires the code to "not work"*. – Nawaz Sep 08 '11 at 10:53
  • @Steve Jessop: But that would mean every single loop that uses `for( auto it = container.begin(); it != container.end(); ++it ) { /* ... */ }` is technically undefined behavior because `container.end()` points one past the end of the array. I still don't see why that one slot is special, it is outside the bounds of the array. – Praetorian Sep 08 '11 at 10:55
  • If you don't mind me asking, why did you need to do this? – Seth Carnegie Sep 08 '11 at 10:55
  • 3
    For reference, in C (C99 at least), the standard says that `&*E` is equivalent to `E`; this is explicitly allowed. So the only problem that remains (in C) is whether creating an invalid pointer in the first place is undefined. I've never found the equivalent language in the C++ standards. – Oliver Charlesworth Sep 08 '11 at 10:56
  • @Seth: It came out of a discussion based around `int x[3] = {0,1,2}; std::copy(&x[0], &x[3], std::ostream_iterator(std::cout, "\n"));` when we realised that the second operand to `std::copy` there may need re-forming for the code to, strictly speaking, be safe. – Lightness Races in Orbit Sep 08 '11 at 10:56
  • @Kerrek: What Steve said. You can't "have" an invalid pointer, strictly speaking. One-past-the-end pointers are an exception, which of course is not part of the formulation of my question. – Lightness Races in Orbit Sep 08 '11 at 10:57
  • @Tom ah ok, and isn't that correct? Should it not be `x, x + 3`? Seems like even if `3` was a valid index, `&[]` is a waste of time. – Seth Carnegie Sep 08 '11 at 10:58
  • @Praetorian: One-past-the-end is special because the standard explicitly says so. – Oliver Charlesworth Sep 08 '11 at 10:58
  • 1
    @Praetorian: one-past-the end is OK provided you don't dereference, two-past the end is UB. One-past-the-end is special in the case of an array because it is explicitly stated to be special by 5.7/5 (which defines pointer addition). `container.end()` is nothing really to do with pointers except by analogy: the behavior of end iterators is defined somewhere in the library chapters. For vector it's designed so that the implementation can validly use a pointer as `vector::iterator`, for other containers the end value of an iterator might require special-case code in the implementation. – Steve Jessop Sep 08 '11 at 11:00
  • @Oli & Steve: Thanks, I didn't know "one past the end" was singled out in the standard. – Praetorian Sep 08 '11 at 11:02
  • @Seth: Yea, just `x+3` will do I suppose. But for symmetry with `&x[0]` in some cases `&x[0]+3` might be preferable. – Lightness Races in Orbit Sep 08 '11 at 11:04

3 Answers3

19

According to the specification, the effect of dereferencing an invalid pointer itself produces undefined behaviour. It doesn't matter what you do after dereferencing it.

Nawaz
  • 353,942
  • 115
  • 666
  • 851
  • 1
    Note that in C (C99 at least), the standard says that `&*E` is equivalent to `E`; this is explicitly allowed. (But I've never found the equivalent in the C++ standards.) – Oliver Charlesworth Sep 08 '11 at 11:01
  • 3
    @Oli: it's interesting that C99 says that in a footnote, preceded by the word "thus". It doesn't think it's adding a special case to permit `&*` (and it mentions "even if a null pointer" but not "even off-the-end pointers"). It thinks that what is stated in the normative section *logically implies* that `&*` is a no-op even for null pointers. Which is interesting, I suspect that whoever wrote that footnote would draw the same implication from the C++ standard, that `&*` is a no-op provided there's no operator overloading. – Steve Jessop Sep 08 '11 at 11:09
  • @Steve: Yes, it's difficult to see how that footnote is implied by the main body. If it's to be believed, though, it would be strange if C++ subsequently made well-defined behaviour undefined, for backwards compatibility reasons. – Oliver Charlesworth Sep 08 '11 at 11:12
  • 1
    @Oli: yep, the whole thing's a mess in my opinion. My response is that even if it's legal, I want nothing to do with it. So I won't write it and if I ever write a C or C++ implementation I'd ensure that it works and *maybe* stick in an astonishingly high-level warning for it. It's explicit in C++ that you can't form a reference from a null pointer - it seems to me reasonable to suppose that there's no intention that you can form an lvalue by dereferencing one either. There's all sorts added in C99 that C++ makes no effort to be compatible with, since C++98 isn't "subsequent" to C99. – Steve Jessop Sep 08 '11 at 11:20
  • So for instance the reference thing means that `char *p = 0; char &r = *p; &r;` is definitely UB in C++. I'm not certain whether it necessarily follows that `&static_cast(*p);` is UB, but it looks accidental to me if `&*p` is valid and `&static_cast(*p);` isn't. So when writing C++ I'll assume they're all invalid, just in case some implementer somewhere took the conservative view and wrote an optimization that breaks when it encounters such an lvalue -- aggressive memory pre-fetching or whatever. – Steve Jessop Sep 08 '11 at 11:25
  • Are we likely to find any standard references for this? I usually prefer a more detailed answer (and, really, I'm looking for a _proof_ here, rather than an assertion which is something I could have come up with myself!), but given the lack of one thus far I might just accept this one. – Lightness Races in Orbit Sep 18 '11 at 15:25
  • Yes, Tomalak...let me quote it. – Nawaz Sep 18 '11 at 15:26
  • @Tomalak: Here is one non-normative one §1.9/4 (C++03): *Certain other operations are described in this International Standard as undefined (**for example, the effect of dereferencing** the null pointer). [Note: this International Standard imposes no requirements on the behavior of programs that contain undefined behavior.]*. Although, it talks about null-pointer as an example, but it extends to any invalid pointer, in my opinion. Are you satisfied with this? – Nawaz Sep 18 '11 at 15:29
  • @Nawaz: Not really :( **\[** No offence is intended to you, BTW -- the whole point of this question is that the standard doesn't _appear_ to make this explicit in normative text, at least AFAICT. And at least not directly. **\]** In fact, [they even changed that paragraph in C++11](http://tinyurl.com/6b64njm) to instead read "(for example, the effect of attempting to modify a `const` object)", as if to acknowledge that pointer dereferences are more complex. And, besides, this question is not about null pointers but non-null _invalid_ pointers. – Lightness Races in Orbit Sep 18 '11 at 15:30
  • @Tomalak: So are you trying to say the non-normative notes would tell something which is against the Standard? I think, non-normative means the same conclusion can be arrived at from normative text. So non-normative is basically the Stadard's own interpretation of the normative text in more simpler terms. – Nawaz Sep 18 '11 at 15:31
  • @Nawaz: I'm saying that (a) that statement doesn't address the issue at hand; (b) non-normative text in parentheses is neither a proof nor an explanation. – Lightness Races in Orbit Sep 18 '11 at 15:32
  • 2
    @SteveJessop: Actually, C makes `&*` a special case (in the paragraph describing `&`). C++ doesn't make this distinction, however, I can't find explicitly anywhere in the standard that dereferencing the null pointer (with no following lvalue-to-rvalue conversion) is undefined behavior (although it is mentioned in several notes). – jpalecek Oct 28 '11 at 12:24
17

Assuming the variable `ptr' does not contain a pointer to a valid object, the undefined behavior occurs if the program necessitates the lvalue-to-rvalue conversion of the expression `*ptr', as specified in [conv.lval] (ISO/IEC 14882:2011, page 82, 4.1 [#1]).

During the evaluation of `&*ptr' the program does not necessitate the lvalue-to-rvalue conversion of the subexpression `*ptr', according to [expr.unary.op] (ISO/IEC 14882:2011, page 109, 5.3.1 [#3])

Hence, it is legal.

chill
  • 16,470
  • 2
  • 40
  • 44
  • Do you have a citation for your second paragraph? – Lightness Races in Orbit Oct 28 '11 at 14:13
  • 2
    Yes, 5.3.1 Unary operators [#3] "The result of the unary & operator is a pointer to its operand. __The operand shall be an lvalue or a qualifiedid.__" – chill Oct 28 '11 at 16:06
  • That might clinch it! Edit into your answer, then I'll verify and perhaps accept. (BTW please use `@` notification syntax; I stumbled upon your comment reply only by chance.) – Lightness Races in Orbit Oct 30 '11 at 01:55
  • @TomalakGeret'kal, here, I added another reference to the standard and clarified the wording a bit, in the first line it should be "pointer to a valid object" instead of "a valid pointer to object". – chill Oct 30 '11 at 09:34
  • 5.3.1 says that the operand shall be an lvalue; but does that guarantee that that operand's value is not converted to an rvalue later? – Lightness Races in Orbit Oct 30 '11 at 17:11
  • [Check this answer](https://stackoverflow.com/a/47229011/4832499), it is a direct contradiction. Unless the _`&*E` is equivalent to `E`_ clause is in effect for C++ too, this answer is wrong. – Passer By Nov 10 '17 at 18:33
2

It is legal. Why wouldn't it be? You're just setting a value to a pointer, and then accessing to it. However, assigning the value by hand must be obviously specified as undefined behavior, but that's the most a general specification can say. Then, you use it in some embedded software controller, and it will give you the correct memory-mapped value for some device...

Diego Sevilla
  • 28,636
  • 4
  • 59
  • 87