4

It has been asked before in various forms, but since the language specification appears to be quite dynamic in this regard (or at least was dynamic when some SO discussions of this matter took place), it might make sense to revisit the matter in light of any more recent developments, if any exist.

So, the question is, again, whether a combination of & and subscript is a valid way to obtain a pointer to the imaginary past-the-end element of an array

int a[42] = {};
&a[42];

It was considered undefined in C++98. But what about modern C++? We have seen DR#232, which nevertheless is still in "drafting" state for some reason and definitely not in the standard text (as of C++14). Is the matter still hanging in the air or has it been resolved by some alternative means?

What is interesting is that DR#315 seem to openly permit calling non-static member functions through a null pointer p (!) on the basis that "*p is not an error when p is null unless the lvalue is converted to an rvalue". It feels like the resolution of DR#315 was tentatively based on the supposedly slam-dunk future resolution of DR#232, but the latter failed to materialize. In that light, is DR#315 really a NAD?

Also, since C++11 the library specification defines dereferenceable iterators simply as iterators for which *it expression is valid, which in case of std::vector would/might largely delegate the matter to the above issue for raw arrays, and apparently open the door for dererenceable std::vector::end() iterators. This potentially makes the following code valid

std::vector<int> v(42);
&v[42];

Is it really valid? Some older answers on SO categorically state that dereferencing standard end() iterators is always undefined. But it does not appear to be so clear-cut in post-C++11 versions of the language. The standard says that the library implementation "never assumes" end-iterators to be dereferenceable, which means that they are not unconditionally non-dereferenceable anymore.

P.S. I have already seen this discussion Lvalues which do not designate objects in C++14, but it seems to be focused specifically on the validity of reference initialization, which I don't want to bring here.

Community
  • 1
  • 1
AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765

2 Answers2

1

To the best of my understanding you are dereferencing it in the &v[42] (or &a[42]) expression and it is undefined.

Basing on N4140:

[expr.unary.op]/1

The unary * operator performs indirection : the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.

I don't think the non-element past the last element of an array is considered an object.

krzaq
  • 16,240
  • 4
  • 46
  • 61
  • 2
    Well, the crux of the matter is the implicit dereference hidden inside the `[]`. Is it legal or not? I removed pointers `p` from my question to avoid confusion. – AnT stands with Russia Oct 04 '16 at 20:02
  • 2
    I'm dereferencing in `&a[42]` as well, since it is `&*(a + 42)`, but AFAIK the usual consensus is that it is OK as long as I don't apply lvalue-to-rvalue conversion to the result of that dereference. Note that C spec has wording that specifically allows this `&*` combination to annihilate. But C++ does not follow that approach. – AnT stands with Russia Oct 04 '16 at 20:13
  • Would you perform lvalue-to-rvalue in `a[42] = 0;`? I don't think so, but it's certainly UB. – krzaq Oct 04 '16 at 20:18
  • OK, replace "lvalue-to-rvalue" with "accessing the object". Also, "modifiable lvalue" requirement of `=` operator can be used to outlaw `a[42]` on the LHS of assignment without interfering with the validity of `&a[42]`. – AnT stands with Russia Oct 04 '16 at 20:23
  • The extra mention of "an lvalue referring to the object" as the result of unary `*` is interesting. However, this appears to be a part of the *output* specification of the unary `*`. I.e. if I satisfy the pre-conditions, then `*` will somehow *guarantee* me "lvalue referring to the object" as output (I don't care how). – AnT stands with Russia Oct 04 '16 at 21:14
  • I agree that imaginary element past-the-end should not be considered an object. But this only raises the questions to the authors of the above wording: what on Earth were they trying to say by this? What happens if I apply unary `*` to the past-the-end pointer (which seems to satisfy the pre-conditions)? How are they going to come up with "an lvalue referring to the object"? – AnT stands with Russia Oct 04 '16 at 21:14
  • For that question I have no answer. I wish it was explicitly and plainly said one way or the other. – krzaq Oct 04 '16 at 21:17
  • BTW, note that DR#315 (http://open-std.org/jtc1/sc22/wg21/docs/cwg_closed.html#315) nonchalantly states that dereferencing a null pointer is OK as long as there's no lvalue-to-rvalue conversion... – AnT stands with Russia Oct 06 '16 at 03:05
  • It is interesting. While not normative, this authority certainly appeals to me. But let's invert the logic. Say it is perfectly fine to dereference past-the-end pointer (or a null pointer). According to the standardese above, the result is a (valid?) lvalue referring to an object. That's baffling to me. – krzaq Oct 06 '16 at 03:10
0

My best guess:

Except where it has been declared for a class (13.5.5), the subscript operator [] is interpreted in such a way that E1[E2] is identical to *((E1)+(E2)).

a[42] is equal to a *(a + 42)

[§ 5.7 Additive operators]

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. ...

... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

So a + 42 seems to return a valid T* pointer, which would dereference into a T (per [expr.unary.op])

If the type of the expression is “pointer to T,” the type of the result is “T.”

There is also the following note:

[§ 3.9.2 Compound types]

Note: For instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address.

It seems like it is valid. I still think assigning to it would be undefined behavior (due to the note, that it is an unrelated object), but getting the address appears to be defined.

That being said, &a[41] + 1 is defined (thanks to 5.7) and avoids this completely, maybe just do that.

lcs
  • 4,227
  • 17
  • 36
  • [expr.unary.op] also says "the result is an lvalue referring to the object [...]". Am I misunderstanding what an object is? – krzaq Oct 04 '16 at 20:54
  • @lcs: The intent of the note in 3.9.2 you quoted is *not* to say that the imaginary past-the-end array element `a[42]` is always a valid object. The intent is to say that some completely unrelated valid object `x` (of the same type) can be [accidentally] placed at `a + 42` location in memory. And thus past-the-end pointer `a + 42` can end up pointing at `x`. In this case a pointer obtained as past-the-end pointer is still a valid pointer to `x`. That's all it is trying to say. – AnT stands with Russia Oct 04 '16 at 20:58
  • I've removed the word valid, it was poorly placed. The point I was making was that `a + 42` yields a valid `T*`, which can be dereferenced. What you get when you deference is `T`, which is a valid operand for `&` and results in `T*` – lcs Oct 04 '16 at 21:14