88

I have seen it asserted several times now that the following code is not allowed by the C++ Standard:

int array[5];
int *array_begin = &array[0];
int *array_end = &array[5];

Is &array[5] legal C++ code in this context?

I would like an answer with a reference to the Standard if possible.

It would also be interesting to know if it meets the C standard. And if it isn't standard C++, why was the decision made to treat it differently from array + 5 or &array[4] + 1?

M.M
  • 138,810
  • 21
  • 208
  • 365
Zan Lynx
  • 53,022
  • 10
  • 79
  • 131
  • 7
    @Brian: Nope, it would only require bounds checking if the runtime was required to *catch* the error. To avoid that, the standard can simply say "not allowed". It's undefined behavior at its finest. You're not allowed to do it, and the runtime and compiler aren't required to *tell* you if you do it. – jalf Jun 12 '09 at 18:16
  • Ok, just to clarify a bit, because the title misled me: A pointer one past the end of an array is not out-of-bounds. Out of bounds pointers are not allowed in general, but the standard is a lot more lenient with one-past-the-end pointers. You might want to edit the title if you're specifically asking about one-past-the-end pointers. If you want to know about out of bounds pointers *in general*, you should edit your example. ;) – jalf Jun 12 '09 at 18:45
  • He's not asking about pointers one past in general. He's asking about using the & operator to get the pointer. – Matthew Flaschen Jun 12 '09 at 18:48
  • 1
    @Matthew: But the answer to that depends on where that pointer points to. You're allowed to take the address in the one-past-the-end case, but not in an out-of-bounds case. – jalf Jun 12 '09 at 18:52
  • Section 5.3.1.1 Unary operator '*': 'the result is an lvalue referring to the object or function'. Section 5.2.1 Subscripting The expression E1[E2] is identical (by definition) to *((E1)+(E2)). By my reading of the standard here. There is no de-refrencing of the resulting pointer. (see full explanation below) – Martin York Jun 13 '09 at 19:18
  • Side note: you can just say int* array_begin = array; – rlbond Jun 14 '09 at 22:52
  • "One past the end" is how ranges are specified, including iterators. So it would be very strange if it wasn't legal. – Nikos C. Nov 07 '17 at 23:52
  • FYI: In GCC `==` comparison on "one-past" pointer may give wrong result (bug [61502](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61502), [demo](https://godbolt.org/z/ssETn9MG7)). – pmor Oct 27 '22 at 14:18

14 Answers14

47

Yes, it's legal. From the C99 draft standard:

§6.5.2.1, paragraph 2:

A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero).

§6.5.3.2, paragraph 3 (emphasis mine):

The unary & operator yields the address of its operand. If the operand has type ‘‘type’’, the result has type ‘‘pointer to type’’. If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted, except that the constraints on the operators still apply and the result is not an lvalue. Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. Otherwise, the result is a pointer to the object or function designated by its operand.

§6.5.6, paragraph 8:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

Note that the standard explicitly allows pointers to point one element past the end of the array, provided that they are not dereferenced. By 6.5.2.1 and 6.5.3.2, the expression &array[5] is equivalent to &*(array + 5), which is equivalent to (array+5), which points one past the end of the array. This does not result in a dereference (by 6.5.3.2), so it is legal.

Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
  • Interesting, so it's legal and explicitly well defined in C which *may* be different from C++ (see other discussions!). – CB Bailey Jun 12 '09 at 19:32
  • 3
    He explicitly asked about C++. This is the kind of subtle difference that can not be relied when porting between the two. – Matthew Flaschen Jun 12 '09 at 21:34
  • 3
    He asked about both: "It would also be interesting to know if it meets the C standard." – CB Bailey Jun 12 '09 at 21:42
  • 2
    @Matthew Flaschen: The C++ standard incorporates the C standard by reference. Annex C.2 contains a list of changes (incompatibilities between ISO C and ISO C++), and none of the changes relate to these clauses. Hence, &array[5] is legal in C and C++. – Adam Rosenfield Jun 12 '09 at 22:00
  • 14
    The C standard is a normative reference in the C++ standard. That means that provisions in the C standard that are referenced by the C++ standard are part of the C++ standard. It does not mean that everything in the C standard applies. In particular Annex C is informative, not normative, so just because a difference isn't highlighted in this section doesn't mean that the C 'version' applies to C++. – CB Bailey Jun 12 '09 at 22:20
  • I wonder whether it's worth to submit a issue report, asking them to support the &*-is-noop semantics. If the operand is an incompatible class type that has operator& overloaded, using operator& is already undefined, so they don't even have to change anything, i think. They just have to introduce this no-op rule, as a syntactical transformation. I think this will greatly reduce the current problems. – Johannes Schaub - litb Jun 13 '09 at 13:01
  • @Charles Bailey: There's a difference between the C standard (which is probably C89, or C90?) and C99 which was standardised after the first C++ stndard (ie. C++ 98). IMHO, the C++ committee has tried to incorporate C99 fixed and additions where possible, but sometimes it just seems that C99 has solved problems in ways that make compatibility difficult at best. Either way, what you say does not apply to C99, only to the earlier standard. – Richard Corden Jun 16 '09 at 16:05
  • C89 was the C standard published by ANSI in 1989; C90 was the C standard published by ISO in 1990. They are essentially identical; I don't know if they are 100% identical. In any case, though, you're right -- the current C++ standard, C++03, refers to C90, not to C99. I don't know if the next C++ standard, C++0x, will refer to C90 or C99. – Adam Rosenfield Jun 16 '09 at 17:36
  • 1
    @Adam: Thanks for pointing that out. I have never quite been sure of C's history. Re C++0x referring to C99, my little knowledge of the changes in C99, I'm pretty sure that C++ will continue to refer to C89/90 and will cherry pick the "desirable" changes from C99 on a case by case basis. This question/answer is a good example of this. I'd say that C++ will continue to use the no "lvalue-to-rvalue" therefore no undefined behaviour, rather than integrating the "&* == no-op" wording. – Richard Corden Jun 17 '09 at 07:49
  • @litb: It's already undefined behaviour to use unary-& on an incomplete class type that later declares a member "operator&". 5.2.1/4 says: "The address of an object of incomplete type can be taken, but if the complete type of that object is a class type that declares operator&() as a member function, then the behavior is undefined (and no diagnostic is required)." – Richard Corden Jun 17 '09 at 07:51
  • Adam, please reconsider this. I believe this is Undefined Behavior because, according to the passage you quote, the out-of-bounds pointer has *already* been dereferenced (by definition). See my answer below. – John Dibling Nov 07 '12 at 21:03
  • "This does not result in a dereference". YES it does. `&*(array + 5)` it's the `*` operator. You cannot count on the compiler to optimize `&*` out. `&*(array + 5)` is certainly not equivalent to `(array+5)` – PoweredByRice Apr 21 '17 at 02:05
  • 2
    @PoweredByRice: It's legal in C99, please read the quoted passage of the standard above where it explicitly says that neither the the `&` nor the `*` operators are evaluated. C++ is different. C++11 — which was still being drafted at the time this answer was originally written — does not have a similar clause, from what I can find. – Adam Rosenfield Apr 21 '17 at 20:41
43

Your example is legal, but only because you're not actually using an out of bounds pointer.

Let's deal with out of bounds pointers first (because that's how I originally interpreted your question, before I noticed that the example uses a one-past-the-end pointer instead):

In general, you're not even allowed to create an out-of-bounds pointer. A pointer must point to an element within the array, or one past the end. Nowhere else.

The pointer is not even allowed to exist, which means you're obviously not allowed to dereference it either.

Here's what the standard has to say on the subject:

5.7:5:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

(emphasis mine)

Of course, this is for operator+. So just to be sure, here's what the standard says about array subscripting:

5.2.1:1:

The expression E1[E2] is identical (by definition) to *((E1)+(E2))

Of course, there's an obvious caveat: Your example doesn't actually show an out-of-bounds pointer. it uses a "one past the end" pointer, which is different. The pointer is allowed to exist (as the above says), but the standard, as far as I can see, says nothing about dereferencing it. The closest I can find is 3.9.2:3:

[Note: for instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address. —end note ]

Which seems to me to imply that yes, you can legally dereference it, but the result of reading or writing to the location is unspecified.

Thanks to ilproxyil for correcting the last bit here, answering the last part of your question:

  • array + 5 doesn't actually dereference anything, it simply creates a pointer to one past the end of array.
  • &array[4] + 1 dereferences array+4 (which is perfectly safe), takes the address of that lvalue, and adds one to that address, which results in a one-past-the-end pointer (but that pointer never gets dereferenced.
  • &array[5] dereferences array+5 (which as far as I can see is legal, and results in "an unrelated object of the array’s element type", as the above said), and then takes the address of that element, which also seems legal enough.

So they don't do quite the same thing, although in this case, the end result is the same.

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
jalf
  • 243,077
  • 51
  • 345
  • 550
  • 5
    &array[5] is pointing to one past. However, this is not a legal way to get that address. – Matthew Flaschen Jun 12 '09 at 18:27
  • 1
    agreed, I would say &array[5] is UB. (even though it may work as expected in practice). – Evan Teran Jun 12 '09 at 18:30
  • @Evan: Yeah, I realized that too, and edited my post. Note that the question title asks about out-of-bounds though. The answer should describe both cases now. – jalf Jun 12 '09 at 18:42
  • Oh, I should probably mention that this is based on draft n1905 (from 2005). I don't have access to the "real" standard here, and this was the first one Google turned up. – jalf Jun 12 '09 at 18:49
  • 5
    the last sentence is incorrect. "array + 5" and "&array[4] + 1" do NOT dereference one past the end, while array[5] DOES. (i also assume you meant &array[5], but the comment still stands). The first two simply point one past the end. – user83255 Jun 12 '09 at 18:52
  • @ilproxyil: You're right. Fixed it. Hopefully, that's all. SO is starting to throw CAPTCHA's at me now for repeatedly editing this post... ;) – jalf Jun 12 '09 at 18:59
  • @Martin: Don't you start, I'm tired of editing this thing. ;) Anyway, the standard says that array[5] is equivalent by definition to *(array + 5), so surely it *does* dereference the address. Or am I missing something (again)? ;) – jalf Jun 12 '09 at 19:03
  • @ilproxy: array[5] does not de-reference the address. You can consider it as an expression that is a 'reference to value'. It only de-references the address if it is used to retrive the value or write to the value. Here we are taking the address. This is explicitly allowed by the standard – Martin York Jun 12 '09 at 19:04
  • @jalf: Yes you are missing somthing. It is a reference to an rvalue. – Martin York Jun 12 '09 at 19:04
  • Which is? Where do you get the rvalue from? – jalf Jun 12 '09 at 19:12
  • array[5] is the same thing as *(array + 5). Note the *: that's the dereference operator. "array + 5" is perfectly legal, since it's one past the end of the array. array[5], by itself, is not legal, since it accesses memory past the end of the array. &array[5] is probably legal, since &i does not touch the actual value of i, but it would take somebody more skilled with standard-fu than me to prove it. – David Thornley Jun 12 '09 at 21:59
  • @Martin, array[5] surely dereferences (look at 3.8/5 and 8.3.2/4, for example) It just does not read the stored value located there. – Johannes Schaub - litb Jun 12 '09 at 22:20
  • 2
    @jalf that note - i think it merely wants to say that "a + sizeof a" is equally valid to "&b" if b is directly allocated after the array "a", and that the resulting addresses equally "point to" the same object. Not more. Remember that all notes are informative (non-normative): If it would state such fundamental important facts like that there are objects after an array object that are located at the past-the-end, then such rule would need to be made normative – Johannes Schaub - litb Jun 12 '09 at 22:21
  • I'm accepting this answer even though I also like several others. This one references the standards. I'd also accept Adam Rosenfield's if I could. – Zan Lynx Jun 12 '09 at 23:52
  • 2
    As far as ANSI C (C89/C90) is concerned, this is the correct answer. If you follow the standard to the letter, &array[5] is technically invalid, whereas array+5 is valid, even though pretty much every compiler will produce the same code for both expressions. C99 updates the standard to explicitly allow &array[5]. See my answer for full details. – Adam Rosenfield Jun 13 '09 at 01:56
  • 2
    @jalf, also the whole text that note is in starts with "If an object of type T is located at an address A..." <- That says "The following text assumes there is an object at address A." So your quote doesn't (and can't, under this condition) say that there is always an object at address A. – Johannes Schaub - litb Jun 13 '09 at 13:06
  • True, on both points. I guess I should have read the full text before that note. ;) But yeah, I'm not sure either. Even if dereferencing it is well-defined, then obviously the state of the object you access is not. So you might be able to take the address of it, but nothing else really. – jalf Jun 13 '09 at 14:09
  • Section 5.3.1.1 Unary operator '*': 'the result is an lvalue referring to the object or function'. Section 5.2.1 Subscripting The expression E1[E2] is identical (by definition) to *((E1)+(E2)). By my reading of the standard here. There is no de-refrencing of the resulting pointer. – Martin York Jun 13 '09 at 19:09
  • @litb: so are we saying that for a T* which points at one past the end of an array, that there is an object pointed to by the pointer - even if it's only a byte (which is an object) and not actually a T - and therefore unary * is well defined, returning an lvalue of type T but which may not actually be a complete T? Sounds like a plausible interpretation. – CB Bailey Jun 13 '09 at 20:04
  • @Charles, yes that's what i think is going on. It would be all fine, as long as you don't try to read a value (lvalue->rvalue). If you would try, you would fall into 3.10/15 and 4.1/1. Thus, this would be well defined always: unsigned char c[1]; unsigned char c1 = c[1]; But this not always, because you don't know what might be located there besides that byte: float s[1]; float s1 = s[1]; But contrary, this is always fine, i think: s[1]; (no read happening). – Johannes Schaub - litb Jun 13 '09 at 23:17
  • 1
    The selection you are citing in order to justify your answer is from a paragraph that is clearly explaining the TYPE of the pointer (array+size) and is NOT claiming that there is a valid object at that location that is legal to dereference. – Edward Strange Dec 20 '10 at 22:36
  • @jalf: I know it's been a long time, but could you please reconsider this answer? I believe that this actually is Undefined Behavior -- see my new answer below. – John Dibling Nov 07 '12 at 21:01
  • 1
    The C++17 standard now says the exact opposite of the last note you quoted: `[Note: A pointer past the end of an object is not considered to point to an unrelated object of the object’s type that might be located at that address]` . Although sadly they still seem to have stopped short of clearly specifying whether `&array[5]` is well-defined or not – M.M Jan 13 '20 at 04:30
  • @M.M: In cases like this, it irks me that the authors of the Standard endlessly debate which of two useful meanings a syntax should have, rather than providing a syntax to unambiguously specify each meaning and recommending that programmers use the unambiguous forms in cases where it would matter, e.g. specifying that the syntax `*(array+n)` should be usable to access any element of an enclosing array, but `array[n]` would only work for an inner array. Not sure how best to syntactically distinguish an operation that would compute an address `array+n` that could form a just-past pointer... – supercat Jun 19 '21 at 21:56
  • ...but not go beyond, versus a form that wouldn't support "just past" or a form that would allow arbitrary accessing, but I'm not sure how often that would be useful, or whether programs that need to form a "just past" pointer should simply use `array+index`. – supercat Jun 19 '21 at 21:57
17

It is legal.

According to the gcc documentation for C++, &array[5] is legal. In both C++ and in C you may safely address the element one past the end of an array - you will get a valid pointer. So &array[5] as an expression is legal.

However, it is still undefined behavior to attempt to dereference pointers to unallocated memory, even if the pointer points to a valid address. So attempting to dereference the pointer generated by that expression is still undefined behavior (i.e. illegal) even though the pointer itself is valid.

In practice, I imagine it would usually not cause a crash, though.

Edit: By the way, this is generally how the end() iterator for STL containers is implemented (as a pointer to one-past-the-end), so that's a pretty good testament to the practice being legal.

Edit: Oh, now I see you're not really asking if holding a pointer to that address is legal, but if that exact way of obtaining the pointer is legal. I'll defer to the other answerers on that.

Tyler McHenry
  • 74,820
  • 18
  • 121
  • 166
  • 4
    I'd say you're correct, if and only if the C++ spec does not say that &* must be treated as a no-op. I'd imagine it probably does not say that. – Tyler McHenry Jun 12 '09 at 18:33
  • 8
    he page you reference (correctly) says that it is legal to *point* one past the end. &array[5], technically first dereferences (array + 5), then references it again. So it technically is like this: (&*(array + 5)). Fortunately, compiler are smart enough to know that &* can be factored to nothing. However, they don't *have* to do that, therefore, I'd say it is UB. – Evan Teran Jun 12 '09 at 18:34
  • 4
    @Evan: There's more to this. Check out the last line of core issue 232: http://std.dkuug.dk/JTC1/SC22/WG21/docs/cwg_active.html#232. The last example there just looks wrong - but they clearly explain that the distinction is on the "lvalue-to-rvalue" conversion, which in this case doesn't take place. – Richard Corden Jun 12 '09 at 18:44
  • @Richard: interesting, seems there is some debate on the subject. I'd even agree that it **should** be allowed :-P. – Evan Teran Jun 12 '09 at 18:46
  • @Evan Teran: No it does not de-reference the member unless you try and read/write to the area. Think of it as a reference to the member it will not be de-referenced unless you try and obtain the value or change the value. Taking the address does not cause a read or write and thus does not de-reference the value. – Martin York Jun 12 '09 at 19:13
  • 2
    It's is the same kind of undefined behavior as is the "reference-to-NULL" thing people kept discussing about and where seemingly all voted up the answer saying "it is undefined behavior" – Johannes Schaub - litb Jun 12 '09 at 19:21
  • @Richard, note also that they agree so far that the difference should be an lvalue to rvalue conversion. But they find that this is not well reflected in the Standard. The same issue report can be found here which has the other points they noted included (including the concept of an "empty lvalue"): http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232 – Johannes Schaub - litb Jun 12 '09 at 20:07
  • @litb: Agreed. So the conclusion is that the standard performs badly here, but when you take into account the behaviour of C99 (&* == no-op), and the general comments from the committee it seems clear that the behaviour here is supposed to be well defined. Probably the only remaining question to ask is: are there any compilers that, given '&*p', first attempt to read the value in '*p'? – Richard Corden Jun 16 '09 at 08:15
  • @RichardCorden: there are other possible problems if it is UB. Are there any compilers that see `int array[5]; &array[5];`, apply a compile-time bounds check to the sub-expression `array[5]` and refuse to compile it? If it does have UB they are entitled to do this, although given the legality in C and the fact that people rely on it, it would probably not be the most popular error that compiler implemented ;-) – Steve Jessop Feb 18 '14 at 10:29
  • @SteveJessop: The C++ committee is made up of a lot of compiler vendors (eg. Microsoft, Clang, GCC, EDG and more), which is why I feel the note against 232 is important. Also 5.3.1/1 of a recent draft has a note on incomplete types. The result is that the following is legal: `void foo (class A * pA) { A & a (*pA); }`. The key is that the '*pA' in other contexts would be illegal, but as there isn't a conversion to a 'prvalue' it's OK here. I really do believe the intent is that it's the same for &array[N]. – Richard Corden Feb 23 '14 at 16:56
  • Your first link is dead. – Spikatrix Sep 12 '15 at 09:25
  • The GCC documentation only documents the behaviour of gcc. – M.M Feb 13 '16 at 00:09
10

I don't believe that it is illegal, but I do believe that the behaviour of &array[5] is undefined.

  • 5.2.1 [expr.sub] E1[E2] is identical (by definition) to *((E1)+(E2))

  • 5.3.1 [expr.unary.op] unary * operator ... the result is an lvalue referring to the object or function to which the expression points.

At this point you have undefined behaviour because the expression ((E1)+(E2)) didn't actually point to an object and the standard does say what the result should be unless it does.

  • 1.3.12 [defns.undefined] Undefined behaviour may also be expected when this International Standard omits the description of any explicit definition of behaviour.

As noted elsewhere, array + 5 and &array[0] + 5 are valid and well defined ways of obtaining a pointer one beyond the end of array.

CB Bailey
  • 755,051
  • 104
  • 632
  • 656
  • The key point is: "the result of '*' is an lvalue". From what I can tell, it only becomes UB iff you have an lvalue to rvalue conversion on that result. – Richard Corden Jun 12 '09 at 18:41
  • 1
    I would contend that as the result of '*' is only defined in terms of the object to which the expression to which the operator is applied, then it is undefined - by omission - what the result is if the expression didn't have a value which actually referred to an object. It's far from clear, though. – CB Bailey Jun 12 '09 at 18:52
10

I believe that this is legal, and it depends on the 'lvalue to rvalue' conversion taking place. The last line Core issue 232 has the following:

We agreed that the approach in the standard seems okay: p = 0; *p; is not inherently an error. An lvalue-to-rvalue conversion would give it undefined behavior

Although this is slightly different example, what it does show is that the '*' does not result in lvalue to rvalue conversion and so, given that the expression is the immediate operand of '&' which expects an lvalue then the behaviour is defined.

Richard Corden
  • 21,389
  • 8
  • 58
  • 85
  • +1 for the interesting link. I'm still not sure that I agree that p=0;*p; is well defined as I'm not convinced that '*' is well defined for an expression whose value is not a pointer to an actual object. – CB Bailey Jun 12 '09 at 19:25
  • A statement that's an expression is legal, and means to evaluate that expression. *p is an expression that invokes undefined behavior, so anything the implementation does is according to the standard (including emailing your boss, or downloading baseball statistics). – David Thornley Jun 12 '09 at 22:01
  • 1
    Note that the status of that issue is still "drafting" and it hasn't made it into the standard (yet), at least those draft versions of C++11 and C++14 I could find. – musiphil Jul 02 '15 at 23:32
  • The "empty lvalues" proposal was never adopted into any published standard – M.M Jan 13 '20 at 04:36
  • The notes in the issue explicitly refer to the `&*a[n]` problem: "Similarly, dereferencing a pointer to the end of an array should be allowed as long as the value is not used". Unfortunately, this issue has been languishing since 2003 and a resolution is not yet in the standard. – Pablo Halpern Aug 11 '20 at 21:18
7

In addition to the above answers, I'll point out operator& can be overridden for classes. So even if it was valid for PODs, it probably isn't a good idea to do for an object you know isn't valid (much like overriding operator&() in the first place).

Todd Gardner
  • 13,313
  • 39
  • 51
  • 4
    +1 on bringing operator& into discussion, even if experts recommend never overriding it as some STL containers depend on it returning a pointer into the element. It is one of those things that got into the standard before they knew better. – David Rodríguez - dribeas Jun 12 '09 at 19:28
3

This is legal:

int array[5];
int *array_begin = &array[0];
int *array_end = &array[5];

Section 5.2.1 Subscripting The expression E1[E2] is identical (by definition) to *((E1)+(E2))

So by this we can say that array_end is equivalent too:

int *array_end = &(*((array) + 5)); // or &(*(array + 5))

Section 5.3.1.1 Unary operator '*': The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T,” the type of the result is “T.” [ Note: a pointer to an incomplete type (other than cv void) can be dereferenced. The lvalue thus obtained can be used in limited ways (to initialize a reference, for example); this lvalue must not be converted to an rvalue, see 4.1. — end note ]

The important part of the above:

'the result is an lvalue referring to the object or function'.

The unary operator '*' is returning a lvalue referring to the int (no de-refeference). The unary operator '&' then gets the address of the lvalue.

As long as there is no de-referencing of an out of bounds pointer then the operation is fully covered by the standard and all behavior is defined. So by my reading the above is completely legal.

The fact that a lot of the STL algorithms depend on the behavior being well defined, is a sort of hint that the standards committee has already though of this and I am sure there is a something that covers this explicitly.

The comment section below presents two arguments:

(please read: but it is long and both of us end up trollish)

Argument 1

this is illegal because of section 5.7 paragraph 5

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

And though the section is relevant; it does not show undefined behavior. All the elements in the array we are talking about are either within the array or one past the end (which is well defined by the above paragraph).

Argument 2:

The second argument presented below is: * is the de-reference operator.
And though this is a common term used to describe the '*' operator; this term is deliberately avoided in the standard as the term 'de-reference' is not well defined in terms of the language and what that means to the underlying hardware.

Though accessing the memory one beyond the end of the array is definitely undefined behavior. I am not convinced the unary * operator accesses the memory (reads/writes to memory) in this context (not in a way the standard defines). In this context (as defined by the standard (see 5.3.1.1)) the unary * operator returns a lvalue referring to the object. In my understanding of the language this is not access to the underlying memory. The result of this expression is then immediately used by the unary & operator operator that returns the address of the object referred to by the lvalue referring to the object.

Many other references to Wikipedia and non canonical sources are presented. All of which I find irrelevant. C++ is defined by the standard.

Conclusion:

I am wiling to concede there are many parts of the standard that I may have not considered and may prove my above arguments wrong. NON are provided below. If you show me a standard reference that shows this is UB. I will

  1. Leave the answer.
  2. Put in all caps this is stupid and I am wrong for all to read.

This is not an argument:

Not everything in the entire world is defined by the C++ standard. Open your mind.

user207421
  • 305,947
  • 44
  • 307
  • 483
Martin York
  • 257,169
  • 86
  • 333
  • 562
  • 1
    Why do you assert that there is no dereference? `*` is the dereference operator. – Lightness Races in Orbit Aug 06 '13 at 11:23
  • @LightnessRacesinOrbit: `the result is an lvalue **referring** to the object or function`. Unless you actually read from the lvalue there is no de-referencing therefore no undefined behavior. If you read from the lvalue (of one past the end then you have undefined behavior) but if all you do is take its address then you are fine. The standards commitee has asserted that taking the address of one past the end of a memory block does not lead to undefined behavior (as long as you don't look at the value). – Martin York Aug 06 '13 at 11:38
  • @LightnessRacesinOrbit: &array[5] => &*(array + 5) => (array + 5). No de-referencing here. There is only a de-referencce if you actually read the value from the resulting reference. – Martin York Aug 06 '13 at 11:43
  • 2
    According to whom? According to which passage? The `*` performs a dereference. It is the dereference operator. This is what it does. Arguably the fact that you then obtain a new pointer to the resulting value (using `&`) is irrelevant. You can't just present a sequence of evaluation, present the final expression semantics and pretend that the intermediate steps didn't happen (or that the language's rules did not apply to each). – Lightness Races in Orbit Aug 06 '13 at 11:58
  • @LightnessRacesinOrbit: I quote the standard above: `Section 5.3.1.1 Unary operator '*': ` The important part is `the result is an lvalue **referring** to the object or function. ` The result of the `*` operator is an alias. – Martin York Aug 06 '13 at 11:59
  • 3
    I quote from the same passage: `the result is an lvalue referring to the object or function to which the expression points.` It is clear that if no such object exists, there is no behaviour defined for this operator. Your subsequent statement `is returning a lvalue referring to the int (no de-refeference)` is what makes no sense to me. Why do you think that this is not a dereference? – Lightness Races in Orbit Aug 06 '13 at 12:00
  • @LightnessRacesinOrbit: Note '*' is not the dereference operator. It is the `unary operator *`. Calling it the de-reference operator does not change what it is actually doing. It returns a reference to what is being pointed at. Unless you read the value via the reference there is no de-reference. It is the act of reading (or writing) the value that will cause a de-reference. – Martin York Aug 06 '13 at 12:02
  • 2
    `It returns a reference to what is being pointed at.` What is this, if not a dereference? The passage says that `*` performs indirection, and indirection from pointer to pointee is called dereferencing. Your argument essentially asserts that pointers and references are the same thing or, at least, implicitly linked, which is simply not true. `int x = 0; int* ptr = &x; int& y = *x;` Here I dereference `x`. I don't need to use `y` for that to be true. – Lightness Races in Orbit Aug 06 '13 at 12:04
  • @LightnessRacesinOrbit: I disagree that it is a de-reference. It is an alternative name (just like a reference variable). This `int& y = *x` is not a de-reference. You a returning a reference. Not a de-reference. If you had done `int z = *x` then that is a de-reference. The result of the `*x` is a reference the `operator =` then causes a read of the reference which is a de-reference. – Martin York Aug 06 '13 at 12:05
  • @LightnessRacesinOrbit: I am happy to agree to disagree. Obviously few people agree with my opinion (hence only one vote). – Martin York Aug 06 '13 at 12:10
  • @LightnessRacesinOrbit: `int& y = *x` is covered by `8.5.3 References` paragraph 5. then the reference is bound directly to the initializer expression lvalue in the first case, and the reference is bound to the lvalue result of the conversion in the second case This is not a de-reference it is a binding of a reference. Which strengths my argument that * is not a de-reference but returns a reference. Ie it does not return the value it returns a reference to the value. To get the value you must de-reference the reference. – Martin York Aug 06 '13 at 12:21
  • `You a returning a reference. Not a de-reference.` That just makes no sense. You can't "return a de-reference". Do you think a "de-reference" is a _thing_? It's not... a dereference _operation_ takes a pointer and gives you the thing it points to. Whether you then use that object directly or initialise a reference from it is completely irrelevant. There is a reference binding, yes: the reference is bound to the result of your dereference operation. And there is no such thing as "de-referencing a reference". I don't see anyone else claiming the same mystical bizarrity to which you subscribe. – Lightness Races in Orbit Aug 06 '13 at 12:50
  • 1
    `Which strengths my argument that * is not a de-reference but returns a reference. Ie it does not return the value it returns a reference to the value.` No, `*` yields an lvalue that is the original object, not a reference. Read the passage that _you_ cited in bold in your answer. – Lightness Races in Orbit Aug 06 '13 at 12:51
  • 1
    For the record, the standard has never _explicitly_ defined "dereference" (instead relying on a wider understanding of the term, as in _to follow a path of indirection from pointer to pointee_), though it is non-normatively mentioned in the definition for unary `*`. In the most recent C++14 draft, all utterances of "dereference" are removed, because there was this standard bug in that no text was sufficiently clear for me to prove to you how horribly wrong you are. – Lightness Races in Orbit Aug 06 '13 at 13:01
  • @LightnessRacesinOrbit: Lets not get personal. Sorry if you don't like arguing the point (I tend to like the discussion). I will stop if you want to (I am not that vested in this answer). It is hard to have this discussion in comments and I am not expressing myself well (as you seem to think I think a de-reference is a thing). I think if we got in a room together we could talk this threw and you could probably convince my of your arguments. But currently I am not convinced. PS. I can't find the place in the standard that says accessing beyond the end is undefined behavior. Do you know? – Martin York Aug 06 '13 at 13:04
  • @LightnessRacesinOrbit: Do you know where it is? – Martin York Aug 06 '13 at 13:05
  • It's within the definition for binary `+` (which is all-encompassing, since subscripting is defined in terms of binary `+`): `[C++11: 5.7/5]: If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.` – Lightness Races in Orbit Aug 06 '13 at 13:06
  • I was searching for "undefined behavior" I wish they would be consistent to make searching easier. Thanks. PS. I don't want to be an argumentative git (troll). I truly want to either work through the difference (I don't mind being proven wrong and have been many times (each time I learn something new)). I like my views being challenged. But I don't want to start a war over it. As I said I am not that vested in my argument (as it only has one vote). – Martin York Aug 06 '13 at 13:11
  • If you're looking for instances of UB in the language, grepping the standard for "undefined behaviour" is a fool's errand in the first place, even if you were to use every permutation of the phrase. There are a gazillion things that invoke undefined behaviour in C++ merely by being unmentioned in the standard. – Lightness Races in Orbit Aug 06 '13 at 13:12
  • @Lightness Races in Orbit: So I have found (searching for undefined behavior). I thought it would be easy to spot. The reference you gave me is not the correct one. This is about doing pointer maths not about reading the element one passed the end. – Martin York Aug 06 '13 at 13:16
  • Access to array elements is defined in terms of pointer maths. You already quoted 5.2.1 in your answer which states this. – Lightness Races in Orbit Aug 06 '13 at 13:19
  • @LightnessRacesinOrbit: Yes I agree. But this does not indicate that accessing one passed the end is illegal. This is about maths on the pointers and overflow not about access. Maths of one past the end does not invoke overflow according to this: `If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.`. So I don't think this is what we are looking for. – Martin York Aug 06 '13 at 13:35
  • @LightnessRacesinOrbit: So summary we get `arr[5] => *(arr + 5)`. We both agree that (arr+5) is legal (I hope). The sticking point is the `*`. You argue this is a `de-reference operator`. I argue this is `unary * operator` that returns a lvalue referring to the object (which is why I don't like the term de-reference operator it implies an action that does not exist). So the argument is really about does returning an lvalue referring to an object cause a de-reference correct? or are we further apart than that? – Martin York Aug 06 '13 at 13:43
  • Yes, that's the argument we've been having. I find it disingenuous to suggest that applying unary `*` to a pointer could be called "not a dereference", but because the standard never bothered to explicitly define "dereference" (instead relying on common sense that apparently is not as prevalent in the C++ community as one might hope) I can't _prove_ this completely normal and widely understood term to you. That's also why they removed it from C++14 entirely. The point remains that dereferencing past-the-end is invalid, and I maintain that that is what you're doing with `arr[5]`. – Lightness Races in Orbit Aug 06 '13 at 13:49
  • @LightnessRacesinOrbit: So your argument is that `*` returns a de-referenced object (I hope I am not putting words in your mouth) which would make it illegal (I agree if your interpretation is correct then it is an illegal operation). I on the other hand believe that `*` returns a lvalue referencing to an object (as said by all versions of the standard going back to n1804). I interpret this as a reference (this may be shakey). To me having a reference to an object is not the same as de-referencing it (and thus if my interpretation is correct the action is legal). – Martin York Aug 06 '13 at 14:00
  • 1
    `I on the other hand believe that * returns a lvalue referencing to an object` Yes, correct. It got that by dereferencing its pointer operand. | `I interpret this as a reference` No, reference types are a completely different language feature. | `To me having a reference to an object is not the same as de-referencing it` That is correct. – Lightness Races in Orbit Aug 06 '13 at 14:02
  • `I find it disingenuous to suggest that applying unary * to a pointer could be called "not a dereference"`. It has never been called de-referencing. It may be common slang. But the unary operator has always said that it returns an `lvalue referring to the object`. I find it pointless arguing about common slang as it is not in the standard. – Martin York Aug 06 '13 at 14:02
  • `It has never been called de-referencing` Yes, it has been called de-referencing since time began, and is still called that now. Using `&` you perform indirection from an object to a pointer-to-that-object; using `*` you do the opposite, performing indirection from a pointer to its pointee, or _dereferencing_ that pointer. – Lightness Races in Orbit Aug 06 '13 at 14:03
  • OK. So our argument is around the phrase `lvalue referring to an object`. OK I agree that using the term `reference` is not correct as it is overloaded and has meaning in a language context as well as general computer science context. So lets not go down that road. – Martin York Aug 06 '13 at 14:05
  • 1
    Yes, I think you've seen `referring` and assumed that means "references" are involved, which is not true. By the way, see C99 footnote 83: "**Among the invalid values for dereferencing a pointer by the unary `*` operator** are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime." I'm not just making this stuff up! – Lightness Races in Orbit Aug 06 '13 at 14:06
  • `* you do the opposite, performing indirection from a pointer to its pointee, or dereferencing that pointer.` you are interpreting with out refernce to the standard. – Martin York Aug 06 '13 at 14:06
  • The standard assumes a baseline understanding of computer terminology. It also does not define what "arithmetic" or "subtraction" means, or what a logarithm is. Or what "container" means in English. You are expected to know basic terms first. Wikipedia is not normative in proving a language behaviour but to demonstrate again that this is a commonly understood term, see http://en.wikipedia.org/wiki/Dereference_operator. I also invite you to read [this comprehensive answer](http://stackoverflow.com/a/4955297/560648) on the old question "What does 'dereferencing' a pointer mean?" – Lightness Races in Orbit Aug 06 '13 at 14:07
  • @LightnessRacesinOrbit: Lets not get off topic with wikipedia or arguing about `*` being a de-reference operator. Those are just side issues. The real disagreement we have is on the term `lvalue referring to an object`. So let me go read a bit more. – Martin York Aug 06 '13 at 14:08
  • Sorry. What is the disagreement. We were having two threads at the same time I may have got confused. – Martin York Aug 06 '13 at 14:09
  • Okay you may read. The phrase means that the result of dereferencing a pointer is an _lvalue_ that aliases the original object. This is the same as in C. Conventionally I would have expected to get a reference instead, but this is _not_ the case here. – Lightness Races in Orbit Aug 06 '13 at 14:09
  • No, there is only one thread. And I know you are confused. :) – Lightness Races in Orbit Aug 06 '13 at 14:10
  • I would agree: `lvalue that aliases the original object`. You don;t agree that this does not require a de-reference – Martin York Aug 06 '13 at 14:10
  • @LightnessRacesinOrbit: I am still trying to be polite. If we want to be trolls and sarcastic that is easy to do. Can we agree that that the argument is over: `lvalue referring to an object` or please state the argument in your terms. – Martin York Aug 06 '13 at 14:12
  • @LightnessRacesinOrbit: If your argument is that `lvalue referring to an object` is `lvalue that aliases the original object` I agree. I would also point out that this is my whole point. It is an alias to the object not the object and thus no-dereferencing. If you don't agree then I will go read. But now I have to go to work. I will pick up afterwords. – Martin York Aug 06 '13 at 14:18
  • "and thus no-dereferencing" is a non-sequitur. It _doesn't matter_ that you get an alias -- how else could it work? You _do_ get "the original object". You simply get a fresh lvalue for it. This has nothing to do with the fact that you just dereferenced a pointer to get there. Look. **Step 1:** You have a pointer. **Step 2:** You dereference the pointer. **Step 3:** You now have an lvalue that refers (NB. similarity in words notwithstanding, this is _not_ a C++ "reference") to the pointee object. Simples! – Lightness Races in Orbit Aug 06 '13 at 14:28
  • @LightnessRacesinOrbit: Show me the standard for step 2. You are just making stuff because you think `*` is a de-reference operator. That's your fundamental mistake. Its `unary * operator` that returns an `lvalue referring to an object`. All quotes from the standard. There is nothing here that says de-reference. I think I have proved my point with your own words. It just an alias and the subsequent `& operator` takes the address of the alias and returns it as the result of the expression. If you don't agree then I think we are fundamentally going to have to agree to disagree. – Martin York Aug 06 '13 at 15:44
  • You will have to refer to my previous comments, because I'm not going to repeat them all. It's not mentioned in the standard, I've explained this, and I've explained why that doesn't matter. Taking a pointer and retrieving its pointee — in _any_ form — fundamentally involves a _dereference_. I provided several citations. End of story! Please read my comments. – Lightness Races in Orbit Aug 06 '13 at 17:02
  • 2
    @LokiAstari: I have a question, what do you think "dereferencing" means if not "calling the unary `*` operator that returns an lvalue referring to the object to which the expression points"? (Note that the subsequent sentence of the standard does refer to this process as dereferencing in the C++11 spec) – Mooing Duck Aug 06 '13 at 17:25
  • @LightnessRacesinOrbit: You provide no standard citations. End of story. – Martin York Aug 09 '13 at 12:55
  • @@MooingDuck: The "Undefined Behavior" that you are eluding to (attached to the ill-defined term De-referencing) is reading/writing to the actual memory. That is not required in &arr[5] Which is why the standard uses the term `lvalue referring to`. This is why the `operator *` is not called the `de-reference operator` it is called the `unary * operator`. – Martin York Aug 09 '13 at 13:35
  • @Loki: I already explained why standard citations are not required. The C++ standard does not define the entirety of maths and technology, nor does it have to. I'm sorry to see that you still refuse to engage in this discussion in a professional manner. – Lightness Races in Orbit Aug 09 '13 at 14:03
  • @LightnessRacesinOrbit: Seriously. It seems I am the only one quoting an authoritative source (the standard). Your argument is based on you opinion that the '*' is called the de-reference operator (even that does not explain why you think it is illegal) and thus has some magical properties that cause undefined behavior. Please quote the standard that shows undefined behavior – Martin York Aug 09 '13 at 22:20
  • @LokiAstari: I already explained that the standard doesn't _list_ undefined behaviour. Rather, everything unstated is undefined. That's what undefined means. And I've never said those things you ascribe to me; when did I claim that `operator*` has "some magical properties that cause undefined behaviour"? I quoted the standard passage that causes out-of-bounds array access to be undefined (whether through derefencing, subscripting, or _whatever_). Read my comments: I shan't respond to this thread any further until you have. You are the most frustrating person I've ever dealt with here. – Lightness Races in Orbit Aug 09 '13 at 23:44
  • The standard is a complex web of subtly linked rules, and you can't rationalise about it in the simple way you seem to be trying to. – Lightness Races in Orbit Aug 09 '13 at 23:46
  • @LightnessRacesinOrbit: The standard is complex (yes I agree). But the question is well defined and very simple. It easy to answer because you can look up every operator (all two of them). There is not undefined behavior. Your inability to show me a single reference that shows it is undefined basically shows you are simply trolling (I am willing to accept I could be wrong but there is no follow through on a reference). I have been polite and well behaved throughout this conversation (unlike you). So I think we can easily see who is being professional here. – Martin York Aug 10 '13 at 01:10
  • 2
    @LokiAstari: [I showed you references days and days ago](http://stackoverflow.com/questions/988158/take-the-address-of-a-one-past-the-end-array-element-via-subscript-legal-by-the/991310?noredirect=1#comment26462886_991310); you simply refuse to acknowledge that they exist, for some reason. How you can justify this behaviour is beyond me, but you must be the one trolling. – Lightness Races in Orbit Aug 10 '13 at 01:11
  • @LightnessRacesinOrbit: Quote it now then. Just one. I will delete this answer if it shows UB. Yes. you showed one reference that does not even apply to this situation. I am glad you provided the link again to just show your skill http://stackoverflow.com/questions/988158/take-the-address-of-a-one-past-the-end-array-element-via-subscript-legal-by-th/991310?noredirect=1#comment26462886_991310 If you look it up that reference is about pointer arithmetic. And in this case shows the cose above is valid. – Martin York Aug 10 '13 at 01:12
  • 1
    @LokiAstari: I _just_ linked you to the one I already gave you. Why didn't you follow that link, and **read**? Then you need to read all my other comments to find out why it's relevant. This is not a topic that you can prove with a ten-word soundbite. to summarise: your entire answer on why `&*array[N]` is okay hinges on the fantasy that `*` does not perform a "de-refeference", which is a nonsense. We've _fully_ covered why that it so, now, and I have provided all required supporting evidence. – Lightness Races in Orbit Aug 10 '13 at 01:13
  • @LokiAstari: _[replying to your edit]_ Yes, I know the passage is about pointer arithmetic. As I already explained several comments up, that is wholly relevant since array subscripting is defined in terms of pointer arithmetic. We're going around in circles, as long as you're not listening to anything I say. – Lightness Races in Orbit Aug 10 '13 at 01:17
  • @LightnessRacesinOrbit: I see and am so glad. Proving yourself wrong. if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined – Martin York Aug 10 '13 at 01:18
  • @LightnessRacesinOrbit: By that definition there is no UB. You are quoting the wrong part of the standard as this has nothing no do with de-referencing here. – Martin York Aug 10 '13 at 01:20
  • 1
    @LokiAstari: What on earth are you talking about? This entire question is about `&array[N]`. This is equivalent to `&(*(array+N))`. How is dereferencing not relevant? – Lightness Races in Orbit Aug 10 '13 at 01:20
  • @LightnessRacesinOrbit: What are you talking about as you obviously did not read it. – Martin York Aug 10 '13 at 01:21
  • @Loki: I can only assume at this point that you are just deliberately trolling me, as no-one could possibly be this obtuse. – Lightness Races in Orbit Aug 10 '13 at 01:21
  • @LightnessRacesinOrbit: You obviousy either do not understand the quote or are basic your opinion on a misunderstanding of what the name of the operator is without actually understanding what it does. Please for the love of god read the standard. – Martin York Aug 10 '13 at 01:22
  • @LokiAstari: I read the standard almost every day. Why don't you take a moment to _read the comments in this thread_ where the standard has been analysed in detail. You keep repeating "your reference proves my point" over and over, when it does no such thing. It's impossible to discuss anything with you. I'm done here now. I wish you luck in coming to understand what "dereference" means. – Lightness Races in Orbit Aug 10 '13 at 01:22
  • @LightnessRacesinOrbit: The only person here that is trolling is you. As the only standard reference you provided proves my point. – Martin York Aug 10 '13 at 01:23
  • You keep saying that its about de-referenicing, but the only standard reference you bring out is about pointer arithmatic. Which has nothing to do with the point you are trying to make. And there is not UB in terms of pointer airthmatic. – Martin York Aug 10 '13 at 01:24
  • 1
    @Loki: Just so that you're aware, since you clearly have no clue, we were talking about your bizarre assertion that `*` does not perform dereferencing. (Read the first comment.) If you think that pointer arithmetic has nothing to do with dereferencing array positions, then you're out of your mind and I can't do anything to help you. – Lightness Races in Orbit Aug 10 '13 at 01:26
  • @LightnessRacesinOrbit: Show we a standard reference that says '*' performce de-referencing. That is the only point you have to make and I will belive you. But that this is not the standard you are quoting. You have to keep your arguments coherent currently you are rambling between two different points. Pointer airthmatic is valid (as proved by your reference to the standard). But no standard quote about de-referencing. – Martin York Aug 10 '13 at 01:28
  • I have requested that all these comments be moved to [chat](http://chat.stackoverflow.com/rooms/35176/discussion-between-lightness-races-in-orbit-and-loki-astari), mainly so that I am not repeatedly tempted by the SO notifications system to continue to be drawn into this utter insanity. – Lightness Races in Orbit Aug 10 '13 at 01:28
  • _There is no standard quote about dereferencing_, and _it does not matter_. I have explained this in great detail in my comments above. **Read. Them.** Please. Do it now, _before_ replying again. Everybody else in the _entire_ world knows that `*` performs dereferencing. It's just you. – Lightness Races in Orbit Aug 10 '13 at 01:29
  • No point. **Show my a reference** in the standard that '*' actually is a de-reference. Otherwise there is nothing to talk about. – Martin York Aug 10 '13 at 01:30
  • @Loki: No point? No point in reading the answer to your question? That demonstrates categorically that you have no interest in a proper discussion, and have just been trolling me all along. – Lightness Races in Orbit Aug 10 '13 at 01:30
  • 1
    @LightnessRacesinOrbit: Yes. You continuous refusal to show a relevant quote from the standard. Your constant trolling just show you have nothing to talk about and are doing this solely to stoke controversy. **Show me a quote!!!!! from the standard** (I have show you section 5.7 (5) is not relevant to this argument. – Martin York Aug 10 '13 at 01:32
  • @Loki: I told you, again, again, again, again, again, that there is none. I told you why this does not matter. I told you why the term is clear regardless. I gave another example that "arithmetic" is not defined in the standard either, yet you accept that without question. Why not this? There is 100% universal acceptance on what dereferencing means in C and C++, everybody but you. I proved that with several links. You ignored _all of it_. – Lightness Races in Orbit Aug 10 '13 at 01:33
  • @LightnessRacesinOrbit: So this argument boils down to you told me your reason (without quoting a relevant point in the standard). While I keep quoting you the standard. Makes you totally correct. Yes – Martin York Aug 10 '13 at 01:35
  • @LightnessRacesinOrbit: Good Night. – Martin York Aug 10 '13 at 01:35
  • @Loki: Not everything in the entire world is defined by the C++ standard. Open your mind. Good night. – Lightness Races in Orbit Aug 10 '13 at 01:35
  • @LightnessRacesinOrbit: We are arguing about C++ not philosophy. The standard defines the language. If your argument is "Its obvious" you have no argument. **Show me a standard quote** – Martin York Aug 10 '13 at 01:36
  • 1
    @LokiAstari: You misspelled my name so I missed the ping. I made no mention of undefined behavior whatsoever. What I asked was `what do you think "dereferencing" means if not "calling the unary * operator that returns an lvalue referring to the object to which the expression points"?`. That was meant as an actual question (which has been asked by several people now) which you have not answered. – Mooing Duck Aug 10 '13 at 02:37
  • @MooingDuck: What I think does not matter. What does the standard say "dereferencing" means? **The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type and the result is an lvalue referring to the object to which the expression points. If the type of the expression is “pointer to T,” the type of the result is “T.” [this lvalue must not be converted to a prvalue, see 4.1. —end note ]** – Martin York Aug 10 '13 at 06:46
  • @MooingDuck: I believe UB (in this context) is accessing memory (read/write) beyond the end of the array. I see nothing that indicates memory accesses (read/write) in the above statement. I don't particularly want to argue further unless you have a clause **from the standard** that we can discuss (about either (a) why this is UB or (b) a definition of "dereferencing"). – Martin York Aug 10 '13 at 06:59
  • @MooingDuck: PS. I apologize for miss-spelling your name. – Martin York Aug 10 '13 at 07:08
  • @LokiAstari: I think it isn't UB, I agree with you 100%, other than the definition of "dereference". My point is (A) The standard uses _but does not define_ the word dereference. From this we must assume some definition exists that is _not_ quoted from the standard. (B) A tentative definition has been put forth as matching the colloquial definition by Lightness. (C) No one has suggested an alternative definition. (D) You must agree there is _a_ definition, but refuse to agree on _any_ suggestion??? Hopefully you can see why Lightness is upset here :P – Mooing Duck Aug 10 '13 at 15:45
  • 2
    From the quote Loki posted "[ Note: a pointer to an incomplete type (other than cv void) can be **dereferenced**. The lvalue thus obtained can be used in limited ways (to initialize a reference, for example); this lvalue must not be converted to an rvalue, see 4.1. — end note ]" (emphasis mine) – milleniumbug Aug 10 '13 at 18:06
  • @LokiAstari: The term "dereference" is used 77 times in the C++ standard. As for Section 5.3.1.1 that you refer, that section contains: "A pointer to an incomplete type (other than cv void) can be dereferenced". This is still in the paragraph detailing Unary operator `*`. – Mooing Duck Aug 11 '13 at 02:44
  • If you must have a definition of "de-reference" why not the one provided by the standard for `unary * operator`? It defines the 'unary * operator' (which is apparently de-referencing) as "Returning an lvalue referring (ie an alias) to an object. So by extension a de-refernece expression is an expression that "returns an lvalue referring (ie an alias) to an object". – Martin York Aug 11 '13 at 04:37
  • I think I'm going to search around a bit and see if there is any standard language around `operator*` and `volatile`. `volatile` states that reads from memory are visible, so if using `operator*` on a pointer to `volatile` is considered visible (regardless of what you do with the result), then it seems that this answer is wrong on a consistency basis. – David Stone Dec 07 '13 at 16:54
  • The latest working draft has replaced utterances of "dereference" with "performs indirection". So not only are the two _obviously_ intended to be synonymous, but this is now going to be "fixed" in the standard such that people like Loki can finally understand. :) – Lightness Races in Orbit Jan 01 '14 at 22:13
  • @LightnessRacesinOrbit: Your point being. Is there anything here that changes your or my arguments? – Martin York Jan 01 '14 at 23:22
  • It backs mine up. The committee is demonstrating that the term "dereference" means what I said it means. They have helpfully disambiguated it for you. I just thought you might be interested. No need to get defensive. – Lightness Races in Orbit Jan 01 '14 at 23:26
  • @LightnessRacesinOrbit: You are going to have to be more specific. So which sentence in the standard changes so that your argument is prevalent. Has this changed? `unary * operator returns a lvalue referring to the object` if not then nothing relevant to my argument changed. – Martin York Jan 01 '14 at 23:31
  • You could read the comments again and see that I never disputed what unary `operator*` does. Yes, it returns an lvalue referring to the object. It does that by deferencing, and that has been proven. I'm not going to get into this again with you, though. I just thought you would find the link interesting. – Lightness Races in Orbit Jan 01 '14 at 23:38
  • A clarification is needed here: `*p` is *not* a reference; **no expression has reference type** and no operator returns a reference. `*p` is a lvalue of non reference type; a function declared with reference return type give you a lvalue of non reference type when called. – curiousguy Jun 09 '16 at 03:15
2

Even if it is legal, why depart from convention? array + 5 is shorter anyway, and in my opinion, more readable.

Edit: If you want it to by symmetric you can write

int* array_begin = array; 
int* array_end = array + 5;
rlbond
  • 65,341
  • 56
  • 178
  • 228
  • I think that the style I use in the question looks more symmetrical: the array declaration and the begin/end pointers, or sometimes I pass those directly to an STL function. That is why I use it instead of the shorter version. – Zan Lynx Jun 12 '09 at 23:45
  • To be symmetrical I think it'd need to be array_begin = array + 0; array_end = array + 5; How's that for a long delayed comment response? – Zan Lynx Mar 16 '10 at 18:44
  • It might be a world record :) – rlbond Mar 16 '10 at 20:56
2

Working draft (n2798):

"The result of the unary & operator is a pointer to its operand. The operand shall be an lvalue or a qualified-id. In the first case, if the type of the expression is “T,” the type of the result is “pointer to T.”" (p. 103)

array[5] is not a qualified-id as best I can tell (the list is on p. 87); the closest would seem to be identifier, but while array is an identifier array[5] is not. It is not an lvalue because "An lvalue refers to an object or function. " (p. 76). array[5] is obviously not a function, and is not guaranteed to refer to a valid object (because array + 5 is after the last allocated array element).

Obviously, it may work in certain cases, but it's not valid C++ or safe.

Note: It is legal to add to get one past the array (p. 113):

"if the expression P [a pointer] points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow"

But it is not legal to do so using &.

Matthew Flaschen
  • 278,309
  • 50
  • 514
  • 539
  • 1
    I upvoted you, because you are correct. There is no object guaranteed to be located at the past-the-end location. The person that downvoted you probably misunderstood you (you sound like you say any array-index-op refers to no object at all). I think here is an interesting thing: It *is* an lvalue, but it also does *not* refer to an object. And so here is a contradiction to what the standard says. And so, this yields undefined behavior :) This is also related to this one: http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232 – Johannes Schaub - litb Jun 12 '09 at 19:16
  • @litb: According to 3.9.2:3, there is "an unrelated object of the array's element type" at the past-the-end location. Doesn't that mean that the result of array[5] *is* an lvalue? – jalf Jun 12 '09 at 19:25
  • 2
    @jalf, the note says "that might be located at that address". It's not guaranteed that there is one located :) – Johannes Schaub - litb Jun 12 '09 at 19:28
  • litb, thanks. However, I still think it is not an lvalue. Because there is no object guaranteed to be at array[5], array[5] can not legally /refer/ to an object. Thus (see def. quoted in my answer), it is not an lvalue, and &array[5] is illegal. I also see that Bill Gibbons says on the Active Issues page, "dereferencing a pointer to the end of an array should be allowed as long as the value is not used", but they do not claim it /is/ allowed (and I'm not sure I agree it /should/ be). – Matthew Flaschen Jun 12 '09 at 21:44
  • 1
    The standard says that the result of op* must be an lvalue, but it only says what that lvalue is if the operand is a pointer which actually points to an object. That would imply (bizarrely) that if one past the end didn't happen to point at a suitable object, that the implementation would have to find a suitable lvalue from somewhere else and use that. That really would mess up &array[sizeof array]! – CB Bailey Jun 12 '09 at 21:49
  • 3
    "However, I still think it is not an lvalue. Because there is no object guaranteed to be at array[5], array[5] can not legally /refer/ to an object." <- That is exactly why i think it is undefined behavior: It relies on some behavior not explicitly specified by the standard, and thus falls within 1.3.12[defns.undefined] – Johannes Schaub - litb Jun 12 '09 at 21:58
  • 1
    litb, fair enough. Let's say it's /not definitely/ an lvalue, and thus /definitely not/ 100% safe. – Matthew Flaschen Jun 12 '09 at 22:48
1

It should be undefined behaviour, for the following reasons:

  1. Trying to access out-of-bounds elements results in undefined behaviour. Hence the standard does not forbid an implementation throwing an exception in that case (i.e. an implementation checking bounds before an element is accessed). If & (array[size]) were defined to be begin (array) + size, an implementation throwing an exception in case of out-of-bound access would not conform to the standard anymore.

  2. It's impossible to make this yield end (array) if array is not an array but rather an arbitrary collection type.

JohnB
  • 13,315
  • 4
  • 38
  • 65
1

Preamable

Quite a few of the answers here are fairly old, and quote relatively old versions of the C++ standard (or drafts thereof). Others are based on the C standard; C99 was revised specifically to make this legal, with defined behavior, but that doesn't mean a matching change was made in C++. It looks like the text in the C++ standard has changed somewhat over time, so it may be unclear how meaningful some of the older citations are for C++ as currently defined.

Since the wording has changed over time, I'm going to cite a couple of specific drafts of the C++ standard. If later drafts revise the wording again (which wouldn't surprise me) the issue would have to be analyzed again with respect to the revised wording.

N4835

A postfix expression followed by an expression in square brackets is a postfix expression. One of the expressions shall be a glvalue of type “array of T” or a prvalue of type “pointer to T” and the other shall be a prvalue of unscoped enumeration or integral type. The result is of type “T”. The type “T” shall be a completely-defined object type.59 The expression E1[E2] is identical (by definition) to *((E1)+(E2)), except that in the case of an array operand, the result is an lvalue if that operand is an lvalue and an xvalue otherwise. The expression E1 is sequenced before the expression E2.

So, array[5] is equivalent to *(array + 5).

We then attempt to take the address of that expression using the & operator. This is defined as follows (§[expr.unary.op]/3):

The result of the unary & operator is a pointer to its operand.

  • If the operand is a qualified-id naming a non-static or variant member m of some class C with type T, the result has type “pointer to member of class C of type T” and is a prvalue designating C::m.
  • Otherwise, if the operand is an lvalue of type T, the resulting expression is a prvalue of type “pointer to T” whose result is a pointer to the designated object (6.7.1) or function. [Note: In particular, taking the address of a variable of type “cv T” yields a pointer of type “pointer to cv T”. —end note] For purposes of pointer arithmetic (7.6.6) and comparison (7.6.9, 7.6.10), an object that is not an array element whose address is taken in this way is considered to belong to an array with one element of type T.
  • Otherwise, the program is ill-formed.

The first of these three possibilities applies to class members, so it's irrelevant here.

The second applies to an lvalue. So the question is whether array + 5 is an lvalue or not. According to §[basic.lval]/1.1:

  • A glvalue is an expression whose evaluation determines the identity of an object, bit-field, or function.
    [...]
  • An xvalue is a glvalue that denotes an object whose resources can be reused (usually because it is near the end of its lifetime).
    [...]
  • An lvalue is a glvalue that is not an xvalue.

While we can form an address one past the end of an array, that address does not determine the identity of an object, bit-field or function. The relevant option would be "object", but there is no object there whose identity it can determine1. As such, when array has been defined with N elements, array + N is not an lvalue.

That leaves only the third option: the program is ill-formed.

N4944

N4944 has identical wording for §[expr.sub]/1 as N4835, so I won't quote it again here.

In N4944 the wording with respect to the * operator has changed slightly. It starts with (§[expr.unary.op]/3):

The operand of the unary & operator shall be an lvalue of some type T.

N4944 retains the same definition of an lvalue though:

  • A glvalue is an expression whose evaluation determines the identity of an object, bit-field, or function.
    [...]
  • An xvalue is a glvalue that denotes an object whose resources can be reused (usually because it is near the end of its lifetime).
    [...]
  • An lvalue is a glvalue that is not an xvalue.

As such, again, a pointer to one past the end of an array is not an lvalue, so code that attempts to apply the * operator to it is ill-formed.

Conclusion

In recent versions of the C++ standard, code like:

int array[5];
int *foo = &array[5];

...is ill formed.


1. Well, it could happen that there's some object at that address, but if so it's an accidental coincidence. Nothing on the standard requires there to be an object that address.
Jerry Coffin
  • 476,176
  • 80
  • 629
  • 1,111
0

C++ standard, 5.19, paragraph 4:

An address constant expression is a pointer to an lvalue....The pointer shall be created explicitly, using the unary & operator...or using an expression of array (4.2)...type. The subscripting operator []...can be used in the creation of an address constant expression, but the value of an object shall not be accessed by the use of these operators. If the subscripting operator is used, one of its operands shall be an integral constant expression.

Looks to me like &array[5] is legal C++, being an address constant expression.

user3840170
  • 26,597
  • 4
  • 30
  • 62
David Thornley
  • 56,304
  • 9
  • 91
  • 158
  • 2
    I'm not sure that the original question is necessarily talking about an array with static storage. Even if it is I wonder if &array[5] isn't a address constant expression precisely because it doesn't point to an lvalue designating an object? – CB Bailey Jun 12 '09 at 22:28
  • I don't think it matters whether the array is static or stack-allocated. – David Thornley Jun 13 '09 at 15:40
  • It does if your referencing 5.19. The part that you elided with ... says "... designating an object of static storage duration, a string literal or a function. ...". This means that if your expression involves a stack allocated array you can't use 5.19 to reason about the validity of those expressions. – CB Bailey Jun 13 '09 at 19:52
  • Your quote is saying that if `&array[5]` is legal, and referred to a static storage duration array, then that would be an address constant. (Compare with `&array[99]` for example, no text in this paragraph distinguishes between those two cases). – M.M Feb 13 '16 at 00:06
-1

If your example is NOT a general case but a specific one, then it is allowed. You can legally, AFAIK, move one past the allocated block of memory. It does not work for a generic case though i.e where you are trying to access elements farther by 1 from the end of an array.

Just searched C-Faq : link text

Aditya Sehgal
  • 2,867
  • 3
  • 27
  • 37
  • the top answer says "its legal" and I also say the same thing. Why the down vote then :). Is something wrong with my answer? – Aditya Sehgal Jun 12 '09 at 18:40
-2

It is perfectly legal.

The vector<> template class from the stl does exactly this when you call myVec.end(): it gets you a pointer (here as an iterator) which points one element past the end of the array.

codymanix
  • 28,510
  • 21
  • 92
  • 151
  • But it does so via pointer arithmetic, not by forming a reference to past-the-end and then applying the address-of operator to that lvalue. – Ben Voigt Oct 10 '14 at 01:54