15

Let's say I have a function, called like this:

void mysort(int *arr, std::size_t size)
{
  std::sort(&arr[0], &arr[size]);
}

int main()
{
  int a[] = { 42, 314 };
  mysort(a, 2);
}

My question is: does the code of mysort (more specifically, &arr[size]) have defined behaviour?

I know it would be perfectly valid if replaced by arr + size; pointer arithmetic allows pointing past-the-end normally. However, my question is specifically about the use of & and [].

Per C++11 5.2.1/1, arr[size] is equivalent to *(arr + size).

Quoting 5.3.1/1, the rules for unary *:

The unary * operator performs indirection: the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points. If the type of the expression is “pointer to T,” the type of the result is “T.” [ Note: a pointer to an incomplete type (other than cv void) can be dereferenced. The lvalue thus obtained can be used in limited ways (to initialize a reference, for example); this lvalue must not be converted to a prvalue, see 4.1. —end note ]

Finally, 5.3.1/3 giving the rules for &:

The result of the unary & operator is a pointer to its operand. The operand shall be an lvalue ... if the type of the expression is T, the result has type “pointer to T” and is a prvalue that is the address of the designated object (1.7) or a pointer to the designated function.

(Emphasis and ellipses mine).

I can't quite make up my mind about this. I know for sure that forcing an lvalue-to-rvalue conversion on arr[size] would be Undefined. But no such conversion happens in the code. arr + size does not point to an object; but while the paragraphs above talk about objects, they never seem to explicitly call out the necessity for an object to actually exist at that location (unlike e.g. the lvalue-to-rvalue conversion in 4.1/1).

So, the questio is: is mysort, the way it's called, valid or not?

(Note that I'm quoting C++11 above, but if this is handled more explicitly in a later standard/draft, I would be perfectly happy with that).

Angew is no longer proud of SO
  • 167,307
  • 17
  • 350
  • 455
  • No not a duplicate, @Angew knows that: trust me. – Bathsheba Feb 12 '16 at 14:42
  • Not sure but what about the note in [basic.compound] paragraph 3 *[ Note: For instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address. There are further restrictions on pointers to objects with dynamic storage duration; see 3.7.4.3. —end note ]* – NathanOliver Feb 12 '16 at 14:46
  • 3
    Can anyone tell me what is the new in this question when compared to the possible duplicate? It seems to me that copy-pasting the answer from there would indeed answer this question too. – Tomáš Zato Feb 12 '16 at 14:46
  • 2
    §6.5.3.2, paragraph 3: ... Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator. Otherwise, the result is a pointer to the object or function designated by its operand. Appears to answer the question. Direct copy/paste from the proposed dupe. – R_Kapp Feb 12 '16 at 14:46
  • @R_Kapp Although that's the C standard, not C++, right? – TartanLlama Feb 12 '16 at 14:47
  • 1
    My reading is that `&arr[size]` **is** UB since it's essentially `&(*something)` where `*something` is UB. But I'm waiting for an expert to confirm. But the dupe has little to do with that. – Bathsheba Feb 12 '16 at 14:49
  • @TartanLlama: This is true. Hadn't noticed that the dupe was tagged with both C and C++ and assumed it was C++ – R_Kapp Feb 12 '16 at 14:49
  • 3
    @Bathsheba How is it not a duplicate? – Barry Feb 12 '16 at 14:50
  • The duplicate is tagged C and C++ (which sort of bothers me... the languages could treat it differently), top voted answer quotes the C standard. This question is purely about C++. – R_Kapp Feb 12 '16 at 14:52
  • IMO the question here is : is the address of past-the-end element equivalent to the end() method for a pointer? There is little doubt this stands true for every existing compiler, but maybe some mad scientist could design a compiler using some weird coding (like setting some unused bit to 1 in the last element address or pointing to the 42th past end element) as the internal past-the-end iterator without violating the spec... – kuroi neko Feb 12 '16 at 15:05
  • With normal object, it would be UB for sure (as it might use `operator &`). I don't think it would be correct for built-in. – Jarod42 Feb 12 '16 at 17:14
  • [This](http://stackoverflow.com/questions/988158/take-the-address-of-a-one-past-the-end-array-element-via-subscript-legal-by-the) is the same question that attracted a lot of attention. It is dual-tagged C and C++, however (as described by the asker) it is mainly a C++ question but he said he added the C tag out of curiosity – M.M Feb 12 '16 at 23:58

5 Answers5

9

It's not valid. You bolded "result is an lvalue referring to the object or function to which the expression points" in your question. That's exactly the problem. array + size is a valid pointer value that does not point to an object. Therefore, your quote about *(array + size) does not specify what the result refers to, and that then means there is no requirement for &*(array + size) to give the same value as array + size.

In C, this was considered a defect and fixed so that the spec now says in &*ptr, neither & nor * gets evaluated. C++ hasn't yet received fixed wording. It's the subject of a very old still active DR: DR #232. The intent is that it is valid, just as it is in C, but the standard doesn't say so.

  • The word "valid" needs to be clarified. The address or value of the expression is perfectly valid. However, accessing the item at that location is undefined behavior. – Thomas Matthews Feb 12 '16 at 17:24
  • 3
    @ThomasMatthews The whole point of my answer is that while what you say is what's intended, it's *not* what the standard says. –  Feb 12 '16 at 17:29
  • I wasn't sure whether "the object x for which P(x)" can be interpreted as "if P(x) then x else ??" (which was basically the essence of the question), but the existence of the DR strongly suggests so, or at the very least shows that the issue needs addressing. The DR clears it up for me, thanks. – Angew is no longer proud of SO Feb 15 '16 at 15:37
1

In the context of normal C++ arrays, yes. It is legal to form the address of the one-past-the-end element of the array. It is not legal to read or write to what it is pointing at, however (after all, there is no actual element there). So when you do the &arr[size], the arr[size] forms what you might think of as a reference to the one-past-the-end element, but has not tried to actually access that element yet. Then the & gets you the address of that element. Since nothing has tried to actually follow that pointer, nothing bad has happened.

This isn't by accident, this makes pointers into arrays behave like iterators. Thus &a[0] is essentially .begin() on the array, and &a[size] (where size is the number of elements in the array) is essentially .end(). (See also std::array where this ends up being more explicit)

Edit: Erm, I may have to retract this answer. While it probably applies in most cases, if the type stored in the array has an overridden operator& then when you do the &a[size], the operator& method may attempt to access members of the instance of the type at a[size] where there is no instance.

Andre Kostur
  • 770
  • 1
  • 6
  • 15
0

Assuming size is the actual array size, you are passing a pointer to past-the-end element to std::sort().

So, as I understand it, the question boils down to: is this pointer equivalent to arr.end()?

There is little doubt this is true for every existing compiler, since array iterators are indeed plain old pointers, so &arr[size] is the obvious choice for arr.end().

However, I doubt there is a specific requirement about the actual implementation of plain old array iterators.

So, for the sake of the argument, you could imagine a compiler using a "past end" bit in addition to the actual address to implement plain old array iterators internally and perversely paint your mustache pink if it detected any concievable inconsistency between iterators and addresses obtained through pointer arithmetics. This freakish compiler would cause a lot of existing C++ code to crash without actually violating the spec, which might just be worth the effort of designing it...

kuroi neko
  • 8,479
  • 1
  • 19
  • 43
0

If we admit that arr[i] is just a shorthand for *(arr + i), we can rewrite &arr[size] as &*(arr + size). Hence, we are dereferencing a pointer that points to the past-the-end element, which leads to an undefined behavior. As you correctly say, arr + size would instead be legal, because no dereferencing operation takes place.

Coincidentally, this is also presented as a quiz in Stepanov's notes (page 11).

Ilio Catallo
  • 3,152
  • 2
  • 22
  • 40
-5

It's perfectly fine and well defined as long as size is not larger than the size of the actual array (in units of the array elements).

So if main () called mysort (a, 100), &arr [size] would already be undefined behaviour (but most likely undetected, but std::sort would obviously go wrong badly as well).

gnasher729
  • 51,477
  • 5
  • 75
  • 98