9

Supposing we have:

char* a;
int   i;

Many introductions to C++ (like this one) suggest that the rvalues a+i and &a[i] are interchangeable. I naively believed this for several decades, until I recently stumbled upon the following text (here) quoted from [dcl.ref]:

in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the "object" obtained by dereferencing a null pointer, which causes undefined behavior.

In other words, "binding" a reference object to a null-dereference causes undefined behavior. Based on the context of the above text, one infers that merely evaluating &a[i] (within the offsetof macro) is considered "binding" a reference. Furthermore, there seems to be a consensus that &a[i] causes undefined behavior in the case where a=null and i=0. This behavior is different from a+i (at least in C++, in the a=null, i=0 case).

This leads to at least 2 questions about the differences between a+i and &a[i]:

First, what is the underlying semantic difference between a+i and &a[i] that causes this difference in behavior. Can it be explained in terms of any kind of general principles, not just "binding a reference to a null dereference object causes undefined behavior just because this is a very specific case that everybody knows"? Is it that &a[i] might generate a memory access to a[i]? Or the spec author wasn't happy with null dereferences that day? Or something else?

Second, besides the case where a=null and i=0, are there any other cases where a+i and &a[i] behave differently? (could be covered by the first question, depending on the answer to it.)

personal_cloud
  • 3,943
  • 3
  • 28
  • 38
  • according to the answers [here](https://stackoverflow.com/questions/38526510/the-nullptr-and-pointer-arithmetic), `a+i` is undefined if `a=null`, though your 4th link says it _is_ defined if `i=0`, hmmm – kmdreko Mar 01 '19 at 06:05
  • @kmdreko. That's a good point. I've tweaked the difference description to focus on the `a=null`, `i=0` case for establishing that there is *a* difference between `a+i` and `&a[i]`... Again, leading one to wonder if there are any *other* differences between them. – personal_cloud Mar 01 '19 at 06:14
  • 2
    The intent of the standard never was to disallow `&*a` when `a` is a null pointer. This is a subject of [issue 232](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232). – n. m. could be an AI Mar 01 '19 at 06:16
  • @n.m. Very interesting that the proposed resolution only specifies what happens in the case where `a` is `null` or "one past the last element of an array". These are pretty much the two most useful cases of empty lvalue! But why did they stop there and not just make `a+i` and `&a[i]` completely equivalent...? – personal_cloud Mar 01 '19 at 06:27
  • I'm wondering if perhaps there is no difference between `a+i` and `&a[i]`, but rather just some disagreements and/or misinterpretations of the spec, applied variously to one syntax or the other, without specific intent to apply only to one syntax or the other. Thereby making it appear that there is a difference between `a+i` and `&a[i]` when there is not really a difference? If one believes that C++ has restrictions on pointer arithmetic, then one would apply the same restrictions to both `a+i` and `&a[i]`? (but the restrictions themselves are clouded in controversy). – personal_cloud Mar 01 '19 at 07:13
  • @kmdreko One comment on that question you referenced says: "[expr.add]/5 defines the behavior of pointer+integer only for pointers to arrays (or pointer-to-object, which acts as an array of 1). If the pointer does not point to an array, then by definition the behavior is undefined". And perhaps this is equally applicable to `a+i` and `&a[i]`? – personal_cloud Mar 01 '19 at 07:23
  • The thing is, the text I quoted in the original question specifically talks about a "null dereference", which exists in the `&a[i]` case but not `a+i`. However perhaps this does not apply to C++ as noted in the Dobbs article. – personal_cloud Mar 01 '19 at 07:26
  • I still can't see where the `offsetof` macro evaluates `&a[i]`. – cpplearner Mar 01 '19 at 09:11
  • How `&a[i]` is considered "binding a reference"? – Language Lawyer Mar 01 '19 at 13:05
  • 1
    More frighteningly, `a[i]` and `i[a]` are actually interchangeable … because C. (Try it out.) – Arne Vogel Mar 01 '19 at 14:49

2 Answers2

7

TL;DR: a+i and &a[i] are both well-formed and produce a null pointer when a is a null pointer and i is 0, according to (the intent of) the standard, and all compilers agree.


a+i is obviously well-formed per [expr.add]/4 of the latest draft standard:

When an expression J that has integral type is added to or subtracted from an expression P of pointer type, the result has the type of P.

  • If P evaluates to a null pointer value and J evaluates to 0, the result is a null pointer value.
  • [...]

&a[i] is tricky. Per [expr.sub]/1, a[i] is equivalent to *(a+i), thus &a[i] is equivalent to &*(a+i). Now the standard is not quite clear about whether &*(a+i) is well-formed when a+i is a null pointer. But as @n.m. points out in comment, the intent as recorded in cwg 232 is to permit this case.


Since core language UB is required to be caught in a constant expression ([expr.const]/(4.6)), we can test whether compilers think these two expressions are UB.

Here's the demo, if the compilers think the constant expression in static_assert is UB, or if they think the result is not true, then they must produce a diagnostic (error or warning) per standard:

(note that this uses single-parameter static_assert and constexpr lambda which are C++17 features, and default lambda argument which is also pretty new)

static_assert(nullptr == [](char* a=nullptr, int i=0) {
    return a+i;
}());

static_assert(nullptr == [](char* a=nullptr, int i=0) {
    return &a[i];
}());

From https://godbolt.org/z/hhsV4I, it seems all compilers behave uniformly in this case, producing no diagnostics at all (which surprises me a bit).


However, this is different from the offset case. The implementation posted in that question explicitly creates a reference (which is necessary to sidestep user-defined operator&), and thus is subject to the requirements on references.

Community
  • 1
  • 1
cpplearner
  • 13,776
  • 2
  • 47
  • 72
  • _Since core language UB is required to be caught in a constant expression_ [expr.const] says _"... would have undefined behavior as specified in ..."_ which I always understood as that the UB has to be explicitly specified. And there is no wording explicitly saying that `*p` is UB when `p == nullptr`. – Language Lawyer Mar 01 '19 at 13:10
  • 1
    @LanguageLawyer Well if you mean it's well-formed, then I agree. If you mean there's something that's "not explicitly specified as UB, but is still UB", then I guess you need to prove the existence of such a thing. – cpplearner Mar 01 '19 at 13:17
  • 1
    Prove the existence of UB which is not explicitly marked as UB by the standard? It follows from [the definition of UB](https://timsong-cpp.github.io/cppwp/n4659/defns.undefined): behavior for which this International Standard imposes no requirements. If the standard imposes no requirement on something, this is UB, even if the standard does not say this explicitly. The note after the definition says this: _Undefined behavior may be expected when this International Standard omits any explicit definition of behavior_. – Language Lawyer Mar 01 '19 at 13:24
  • @Language Lawyer But [expr.unary.op/1](http://eel.is/c++draft/expr.unary.op#1) seems to define a basic requirement that if you have a pointer `p`, then `*p` designates its memory location. Doesn't say it has to be a valid location. So by your logic, `*p` is *not* UB... I think @cpplearner's logic is correct here; it is other sections of the spec, like [dcl.ref] that specifically create the possibility of UB here. – personal_cloud Mar 01 '19 at 17:48
  • @personal_cloud I haven't found anything about memory location in the definition of the indirection operator. It says that the resulting lvalue refers to an object or function to which the pointer expression points. And since `p == nullptr` does not point to any object or function and this case is not explicitly handled by the standard, it is considered to be undefined behavior. AFAIU since this UB is implicit, the compilers are not required to diagnose it in constant expressions. But I'm not 100% sure here. – Language Lawyer Mar 01 '19 at 18:14
6

In the C++ standard, section [expr.sub]/1 you can read:

The expression E1[E2] is identical (by definition) to *((E1)+(E2)).

This means that &a[i] is exactly the same as &*(a+i). So you would dereference * a pointer first and get the address & second. In case the pointer is invalid (i.e. nullptr, but also out of range), this is UB.

a+i is based on pointer arithmetics. At first it looks less dangerous since there is no dereferencing that would be UB for sure. However, it may also be UB (see [expr.add]/4:

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the expression P points to element x[i] of an array object x with n elements, the expressions P + J and J + P (where J has the value j) point to the (possibly-hypothetical) element x[i + j] if 0 ≤ i + j ≤ n; otherwise, the behavior is undefined. Likewise, the expression P - J points to the (possibly-hypothetical) element x[i − j] if 0 ≤ i − j ≤ n; otherwise, the behavior is undefined.

So, while the semantics behind these two expression are slightly different, I would say that the result is the same in the end.

Christophe
  • 68,716
  • 7
  • 72
  • 138
  • 1
    But see [\[expr.add\]/7](https://timsong-cpp.github.io/cppwp/n4659/expr.add#7) of C++17 DIS, or [\[expr.add\]/(4.1)](http://eel.is/c++draft/expr.add#4.1) of the current draft standard. – cpplearner Mar 01 '19 at 08:14
  • Thanks for breaking this down into `&*(a+i)`; that is very helpful. Just to clarify: I think you're saying that, if we believe [expr.add]/4, then the `&*` doesn't introduce any UB cases that were not already created by the `(a+i)`? (and if we don't believe [expr.add]/4, then the `&*` might conceivably create UB cases that did not exist in `(a+i)`)? I guess I can accept that as a complete answer. Thank you. – personal_cloud Mar 01 '19 at 17:24
  • @personal_cloud Yes! The pointer arithmetic and indexing are indeed defined in a very consistent manner, so that they lead to the same result (a part from the exception that you've already mentioned in your question). – Christophe Mar 01 '19 at 19:57