2

Consider the following simple code to sort an array.

int myarray[4] = {};
std::sort(myarray, myarray + 4);

I know that it is valid to create a pointer to one past the end of a C-style array.

I've recently seen code like this:

std::sort(myarray, &myarray[4]);

I'm not sure this is valid, because it dereferences an element outside the array bounds, even though the element value is not used for anything.

Is this valid code?

Deduplicator
  • 44,692
  • 7
  • 66
  • 118
Neil Kirk
  • 21,327
  • 9
  • 53
  • 91
  • 1
    [Discussed in depth here](http://stackoverflow.com/questions/988158/take-the-address-of-a-one-past-the-end-array-element-via-subscript-legal-by-the) . Conclusion for C++ seems to be that the C++03 standard is unclear whether it is valid or not. (This thread came before C++11 had been finalized). – M.M Oct 10 '14 at 00:21
  • @MattMcNabb: Does C++14 contain it? – Deduplicator Oct 10 '14 at 00:53
  • [This was raised in DR232](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232) and a resolution suggested. – M.M Oct 10 '14 at 01:01
  • @Deduplicator The resolution is in N3337, but not in N3797. IDK what is in the official standards. – M.M Oct 10 '14 at 01:01
  • C has a rule, introduced in the 1999 standard, that explicitly says that `&*x` means `x` and `&x[y]` means `x+y` (except that the constraints on the operators still apply and the result is not an lvalue). I don't see similar wording in C++11. – Keith Thompson Oct 10 '14 at 01:08
  • @MattMcNabb: Looked for it. That resolution is not in C++14. – Deduplicator Oct 10 '14 at 01:10
  • 1
    This issue has been hanging around for years, it's about time someone in the CWG fixed it up once and for all >.> – M.M Oct 10 '14 at 01:11
  • Have started a [language lawyer thread](http://stackoverflow.com/questions/26290598/lvalues-which-do-not-designate-objects-in-c14) on the underlying issue – M.M Oct 10 '14 at 01:38
  • @MattMcNabb: All the more reason to add answers that rely on wording changes in C++11 or C++14 to the existing question, instead of adding new ones. – Ben Voigt Oct 10 '14 at 01:59
  • 1
    @BenVoigt note that (a) that question was before C++11 in which DR232 was addressed, and (b) the accepted answer is wrong. – M.M Oct 10 '14 at 02:02
  • @MattMcNabb: The accepted answer uses reasoning that's a little fuzzy (it refers to an object that virtually exists according to the Standard, not a real object), but AFAICT it is correct. No lvalue-rvalue conversion -> no UB. – Ben Voigt Oct 10 '14 at 02:04
  • 2
    @BenVoigt he starts off with some irrelevant standard quotes and then makes the completely unfounded closing statement "Which seems to me to imply that yes, you can legally dereference it, but the result of reading or writing to the location is unspecified.". – M.M Oct 10 '14 at 02:18
  • @BenVoigt: That one passage about an object that might or might not exist is only a note. So the answer (that part which answers the question) relies completely on a non-normative part. Put another way, the answer is vacuous. – Deduplicator Oct 10 '14 at 16:59

4 Answers4

7

A[i] is syntactically equivalent to *(A + i) for an array or pointer A.
So &A[i] is syntactically equivalent to &(*(A + i)).

When *(A + i) does not have undefined behavior, &(*(A + i)) will behave identically to A + i.

The problem is that myarray[4] is syntactically equivalent to *(myarray + 4), which dereferences a location out of the array's bounds. That is undefined behavior according to the standard.

So you should absolutely prefer myarray + 4 over &myarray[4] - the latter is undefined behavior.

That &myarray[4] has "correct" behavior with most - if not all - compilers does not exempt it from having undefined behavior according to the standard.

Timothy Shields
  • 75,459
  • 18
  • 120
  • 173
  • is it? even if i is the element count? – Deduplicator Oct 10 '14 at 00:24
  • @Deduplicator Yep, it's the same. – Timothy Shields Oct 10 '14 at 00:27
  • `&*(A + i)` means `&(*(A + i))`. There is no "operator cancellation rule" or anything. The question is whether `*(A + i)` is valid. It looks like an lvalue at first, but the standard says that lvalues must designate objects, and this one does not designate an object. – M.M Oct 10 '14 at 00:28
  • @MattMcNabb I'm aware there is no operator cancellation rule. Regardless, given a pointer `p`, `&*p` is the same as `p` in terms of the behavior. – Timothy Shields Oct 10 '14 at 00:29
  • 2
    @TimothyShields says who? Not the C++ standard. – M.M Oct 10 '14 at 00:30
  • @MattMcNabb `int i = 5; int* p = &i; int* q = &*p;` Is `p == q` not true? Or are you saying there is undefined behavior? – Timothy Shields Oct 10 '14 at 00:33
  • 2
    @TimothyShields In that case `p == q` but that is different to the OP's case because `*p` is valid in your example. – M.M Oct 10 '14 at 00:34
  • @MattMcNabb So is the issue just that `*(myarray + 4)` is dereferencing out-of-bounds memory? It's still going to be an `int&` that points to the expected location, such that applying the reference operator gives the right pointer. – Timothy Shields Oct 10 '14 at 00:36
  • 1
    The question is whether it invokes UB or not. "I'm feeling lucky" and "It seems to work" are not relevant for that. If you say it's valid, where's the citation from the standard? – Deduplicator Oct 10 '14 at 00:40
  • If i == the length of the array, then "& * ( A + i )" is invoking undefined behavior because "* ( A + i)" is dereferencing 1 past the end of the array. In theory at least. I know of no compiler on which this expression would generate any code which actually does access such memory. It does mean though, that a compiler does not have to give you the result that you are expecting here. – joeking Oct 10 '14 at 01:09
  • @MattMcNabb @Deduplicator @joeking Thanks for clarifying. You're right that it is undefined behavior to use the `&myarray[4]` version. I have updated the answer. – Timothy Shields Oct 10 '14 at 01:30
  • is it really UB? 3.9.2:3 might suggest that its ok to dereference it, if you dont read/write to that object – sp2danny Oct 10 '14 at 01:32
  • I don't see how that paragraph would imply that. Care to quote/elaborate? – dom0 Oct 10 '14 at 01:37
  • @dom0 `the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address` Seems like there is a hypothetical object one past the end. Having the address of that object is ok, maby having the object itself is too, if you dont actually use it. – sp2danny Oct 10 '14 at 01:46
  • No, that's not an hypothetical object, that's an *unrelated* object that *might* be there, but may as well not. – dom0 Oct 10 '14 at 01:47
  • @sp2danny: Please take note that that is only a note, and thus not normative. Even if it would prove that point. – Deduplicator Oct 10 '14 at 14:25
0

It is valid for the same reason your original is valid: it can determine where that element would be. In the latter case, just because it identifies a nonexistent element, since it does't try to access it, there is no problem.

Or, more concisely: myarray+4 and &myarray[4] are synonyms.

Scott Hunter
  • 48,888
  • 12
  • 60
  • 101
  • Are they? Sure that dereference of a non-existent element followed by address-of is valid? Where is the proof? – Deduplicator Oct 10 '14 at 00:59
  • Section 5.2.1 of ISO 2012 C++ standard: "The expression E1[E2] is identical (by definition) to *((E1)+(E2))" Deduplicator is being a bit obscure -- I believe he is referrring to the fact that the subscripting is equivalent to dereferencing, thus prefixing with "&" doesn't help because you are already in UB land. Of course, thats more than a little fussy - I really doubt there are any compilers out there that won't do what you expect. – joeking Oct 10 '14 at 01:14
  • 2
    @joeking: The question is still "Is that valid code", and the only valid yard-stick for that is the standard, not "it seems to happen to work, and I expected so". – Deduplicator Oct 10 '14 at 01:18
  • @joeking What happens if I use a compiler that adds bounds checking to arrays in debug mode? – Neil Kirk Oct 10 '14 at 01:20
  • Sure, it is possible that a compiler might detect that, but it is very unlikely. Consider "int *x = (int *) rand()); *x;". Clearly, this is not a good thing, yet the "*x;" isn't likely to cause any compiler to generate any code, so while it is in theory wrong, you won't get punished for it (even on some theoretical compiler that does bounds checking). – joeking Oct 10 '14 at 01:26
  • 1
    @joeking Well such a compiler would only perform it when using an array directly. Technically that code is UB due to the cast so compiler's hands are tied. – Neil Kirk Oct 10 '14 at 01:34
-2

The standard requires that a pointer to one-past the end of an array be a valid value for the pointer to have. (It doesn't mean that it is OK to dereference).

This is required in a few places, for example in pointer comparison (5.9, relational operators):

If two pointers point to elements of the same array or one beyond the end of the array, the pointer to the object with the higher subscript compares higher.

This is actually relied on for the STL. The "end()" function of iterators is equivalent to 1 past the end of the array.

I also see this in section 27.6.2: Stream Buffer requirements

So your snippet:

std::sort(myarray, &myarray[4]);

is basically the same as if you used a vector of int

  vector<int> yourArray;
  std::sort(yourArray.begin(), yourArray.end());
joeking
  • 2,006
  • 18
  • 29
  • Validity of pointers just past the end is not in doubt. The question is whether `&arr[count]` is valid given `int arr8[count];`. – Deduplicator Oct 10 '14 at 00:49
  • 1
    How is that different? If the array size is 4, &array[3] is a pointer to the last, and &array[4] is a pointer to 1 past the last. – joeking Oct 10 '14 at 00:50
  • The intermediate dereference?? – Deduplicator Oct 10 '14 at 00:51
  • 2
    What? Please, spell it out. "&myarray[4]" doesn't dereference anything - it takes the address-of, where-as "* & myarray[4]" would indeed be invalid because you would be dereferencing a pointer to one past the end. – joeking Oct 10 '14 at 00:52
  • `&arr[count]` dereferences one past the end (which is UB) and then takes the address. `arr+count` does not. The question is whether the first is valid despite that dereferencing, due to some twist of the standard or an explicit rule. Neither common sense nor "it seems to work" are acceptable answers there. – Deduplicator Oct 10 '14 at 00:56
  • Yep, you are incorrect. You are consistently ignoring the dereference in the expression `arr[count]`. If you insist that's valid, you won't have any trouble adding a standard reference proving that. – Deduplicator Oct 10 '14 at 01:14
  • @Deduplicator: Standard quote for your claim that there is a "dereference" in `arr[count]`? The only wording I see is "The `*` operator performs *indirection*". It creates a reference, which never undergoes lvalue (reference) to rvalue (value of object) conversion. – Ben Voigt Oct 10 '14 at 04:19
-2

According to §6.5.3.2 in the C99 standard, its valid C:

The unary & operator yields the address of its operand. If the operand has type ‘‘type’’, the result has type ‘‘pointer to type’’. If the operand is the result of a unary * operator, neither that operator nor the & operator is evaluated and the result is as if both were omitted

There seems to be no equivalent counterpart in C++, probably due to operator overloading.

sp2danny
  • 7,488
  • 3
  • 31
  • 53
  • 3
    There is no 6.5.3.2 in C++. You seem to be quoting from a C standard. – M.M Oct 10 '14 at 01:06
  • @MattMcNabb: And there is no such rule in C++ (as of the 2011 ISO C++ standard, unless I'm missing something). – Keith Thompson Oct 10 '14 at 01:08
  • 2
    `which is probably an oversight` More likely it wasn't included because C++'s operator overloading makes it hard to define a catch-all rule. – user657267 Oct 10 '14 at 01:12
  • This is 6.5.3.2 of the C 1999 standard. So in C99, "&*X" . It also says that "the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator." – joeking Oct 10 '14 at 01:22
  • @user657267: What difficulty does overloading cause? It could just state that the rule applies only to the built-in unary `&` operator, not to any overloaded operator with the same name. – Keith Thompson Oct 10 '14 at 01:48