Supposing we have:
char* a;
int i;
Many introductions to C++ (like this one) suggest that the rvalues a+i
and &a[i]
are interchangeable. I naively believed this for several decades, until I recently stumbled upon the following text (here) quoted from [dcl.ref]:
in particular, a null reference cannot exist in a well-defined program, because the only way to create such a reference would be to bind it to the "object" obtained by dereferencing a null pointer, which causes undefined behavior.
In other words, "binding" a reference object to a null-dereference causes undefined behavior. Based on the context of the above text, one infers that merely evaluating &a[i]
(within the offsetof
macro) is considered "binding" a reference. Furthermore, there seems to be a consensus that &a[i]
causes undefined behavior in the case where a=null
and i=0
. This behavior is different from a+i
(at least in C++, in the a=null, i=0 case).
This leads to at least 2 questions about the differences between a+i
and &a[i]
:
First, what is the underlying semantic difference between a+i
and &a[i]
that causes this difference in behavior. Can it be explained in terms of any kind of general principles, not just "binding a reference to a null dereference object causes undefined behavior just because this is a very specific case that everybody knows"? Is it that &a[i]
might generate a memory access to a[i]
? Or the spec author wasn't happy with null dereferences that day? Or something else?
Second, besides the case where a=null
and i=0
, are there any other cases where a+i
and &a[i]
behave differently? (could be covered by the first question, depending on the answer to it.)