4

Considering that C++ does not have bound checking for built-in type arrays, Is it possible that:

One array's off-the-end pointer points to another array's first element?

M.M
  • 138,810
  • 21
  • 208
  • 365
  • 3
    It does not have bounds checking but [accessing it would be undefined behavior](http://stackoverflow.com/q/18727022/1708801). So it could be pointing to another object but since you should not deference it that should not matter. – Shafik Yaghmour Jul 01 '14 at 02:22
  • @ShafikYaghmour, I could not find any reference in the C++ standard to back up that it is UB to dereference such a pointer. The question you link does not have any such text quoted. The C99 standard has a line explicitly added to say that it is UB, but the C++ standard does not have that line. – M.M Jul 01 '14 at 02:27
  • @MattMcNabb accessing an out-of-bounds array element is always undefined behavior. – Aakash Jain Jul 01 '14 at 02:29
  • @aakashjain But it is a valid element (the first one of the next array). – M.M Jul 01 '14 at 02:29
  • 1
    @MattMcNabb __C++ 2011. Section 24.2.1 Paragraph 5.__ Just as a regular pointer to an array guarantees that there is a pointer value pointing past the last element of the array, so for any iterator type there is an iterator value that points past the last element of a corresponding sequence. These values are called past-the-end values. Values of an iterator `i` for which the expression `*i` is defined are called dereferenceable. The library never assumes that past-the-end values are dereferenceable. – Bill Lynch Jul 01 '14 at 02:30
  • The C++ standard also does not appear to say that dereferencing iterators that have gone beyond the end of their container causes UB. – M.M Jul 01 '14 at 02:30
  • @MattMcNabb `5.7 p5` says pointer arithmetic may only point to the same array or one past, otherwise it is undefined and array subscription is equivalent to pointer arithmetic. – Shafik Yaghmour Jul 01 '14 at 02:31
  • @sharth "The The library never assumes that past-the-end values are dereferenceable" does not mean "The values are not dereferenceable", in fact it seems specifically worded to avoid saying "The values are not dereferenceable". – M.M Jul 01 '14 at 02:31
  • @ShafikYaghmour yes, this is the one-past case. Then if you increment the pointer again, you could consider that it was also pointing to the first element of the next array, so it is still OK. – M.M Jul 01 '14 at 02:32
  • But there really isn't any guarantee that it's pointing to the first element of any other array at all. It might be pointing to some random garbage value for all you know. At least that's my understanding...? – Aakash Jain Jul 01 '14 at 02:34
  • @aakashjain, From C++ Primer 5th, pg 121: We can use the **subscript operator** on any pointer, as long as that pointer points to an element (or **one past the last element**) in an array. –  Jul 01 '14 at 02:59
  • @aakashjain, I know that the Primer is not as promising as the standard, but consider that book's somewhat high reputation, it seems to say that it is OK to dereference "one past the last element" using subscript operator, right? –  Jul 01 '14 at 03:01
  • @aakashjain sometimes you can conclude that it must be pointing to a valid object (e.g. if we are talking about two adjacent sub-arrays in a multi-dimensional array; or a standard-layout struct where we have checked via `offsetof` that there is no padding). – M.M Jul 01 '14 at 03:20
  • 2
    @user3792254 That's not what [Google books](http://books.google.com/books?id=8fXCn3E864sC&pg=PA124&lpg=PA124&dq=We+can+use+the+subscript+operator+on+any+pointer,+as+long+as+that+pointer+points+to+an+element+(or+one+past+the+last+element)+in+an+array.&source=bl&ots=BF1yzRscyS&sig=Vfwn4CUzsNCCPbwwbrmm2VIqqcw&hl=en&sa=X&ei=liyyU7utCqed8gHFwoDoBw&ved=0CB8Q6AEwAA#v=onepage&q=We%20can%20use%20the%20subscript%20operator%20on%20any%20pointer%2C%20as%20long%20as%20that%20pointer%20points%20to%20an%20element%20(or%20one%20past%20the%20last%20element)%20in%20an%20array.&f=false) says. – T.C. Jul 01 '14 at 03:38
  • @T.C. 5th > 4th edition. –  Jul 01 '14 at 03:40
  • @user3792254: That statement in your book is wrong. Given `int ia[] = {0,2,4,6,8};`, The expression `ia[5]` will produce undefined behavior. – Bill Lynch Jul 01 '14 at 03:46
  • Actually the author explicitly says "an off the end pointer does not point to an element. As a result we may not dereference or increment an off the end pointer" – psrag anvesh Mar 18 '16 at 15:59

4 Answers4

4

Yes, a pointer beyond the end of an array could point to another object. Dereferencing a pointer beyond the end of an array results in undefined behavior.

joshuanapoli
  • 2,509
  • 3
  • 25
  • 34
2

My opinion: yes, it is possible in C++. There have been several SO threads on this topic, none of which reached any solid conclusion. Here is one example.

In some cases we can be sure that there is actually a valid object in memory immediately after the end of the old object. One case is standard-layout structs; another is multi-dimensional arrays. I originally wrote this post with a multi-dimensional array, but I have edited it to use the standard layout struct case, to avoid any objections about what the term "array object" means in the Standard.

struct
{
    int a[2];
    int b[2];
} foo;

if ( sizeof foo == 4 * sizeof(int) )
{
    int *p = &foo.a[0];

    ++p;    // (1)
    ++p;    // (2)
    *p = 3; // (3)
    ++p;    // (4)
    *p = 5; // (5)
}

Which line causes undefined behaviour (if any)? p is (initially, anyway) a pointer into the array of type int[2] which is designated by foo.a.

After line (2), p is now a one-past-the-end pointer. Is this dereferenceable?

The case of incrementing the pointer is covered by the section on the + operator (it is defined to have the same effect on p as p = p + 1). Here is a quote from C++11 [expr.add]#7:

Unless both pointers point to elements of the same array object, or one past the last element of the array object, the behavior is undefined.

Line (2) does not cause UB by this clause. What about line (3)?

As far as I can see, there is no clause in the C++ standard that says dereferencing a one-past-the-end pointer causes undefined behaviour. In several places it says that iterators "might not be dereferencable", or "the library does not assume that the iterator is dereferenceable". But it carefully avoids saying "the iterator is not dereferenceable".

From the fact that we proved there is no padding, and the rules about standard-layout structs saying that elements cannot be reordered; we can conclude that now p must hold the address of the element foo.b[0]. Therefore, p is a pointer into the subobject foo.b, as well as being a one-past-the-end pointer for foo.a.


Note that in C99 it is different. The text in C99 for the + operator has (emphasis mine):

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

So, in C99 line (3) causes undefined behaviour. However C++ deliberately omits the bolded line.


Rationale: I don't know what the actual rationale is. However, my "mental model" for C's pointers is that it permits the compiler to implement "fat pointers", i.e. bounds-checked pointers. A pointer may contain the bounds of the (sub-)object that it was pointed to; and so the executable can detect array bounds errors at runtime just based on the pointer value.

I believe the C99 text is compatible with this; and the compiler can produce an executable that aborts on line (3).

However , as already stated, C++ does not have equivalent text and I can find no justification in the C++ Standard for considering (3) to cause UB; nor (4) or (5).

Community
  • 1
  • 1
M.M
  • 138,810
  • 21
  • 208
  • 365
  • Whether GCC considers this UB is actually fairly easy to test. Given `bool f(int t) { struct { int a[2]; int b[2];} foo = {{1,2}, {3,4}}; for(int i = 0; i <= 2; i++) { if(foo.a[i] == t) return true; } return false;}`, GCC at `-O3` [compiles it](http://coliru.stacked-crooked.com/a/5a69db1cc1a12a87) to `movl $1, %eax; ret`, i.e., a plain `return true;`. – T.C. Jul 01 '14 at 04:21
  • @T.C. you'll need to also show all the calls to `f` for that to make sense – M.M Jul 01 '14 at 04:23
  • The link is to a disassembly of `f` compiled as a free function. – T.C. Jul 01 '14 at 04:25
  • @T.C. it is undefined behaviour in C, are you talking about gcc or g++ ? – M.M Jul 01 '14 at 04:30
  • g++. Here's a [full example](http://coliru.stacked-crooked.com/a/988a35764165304c). – T.C. Jul 01 '14 at 04:33
  • Note that your example differs from mine; in mine I form a pointer to the first int and increment it twice; in yours you form a pointer to the first int and then add 2 to it. Also you omit the `sizeof` check. – M.M Jul 01 '14 at 04:33
  • @T.C. [version with incrementing one by one](http://coliru.stacked-crooked.com/a/75b4fcce1ac44b9c). Although (if my answer is correct) your original link would be a g++ bug. Of course, g++ does not determine whether the C++ standard is correct, it is the other way around :) – M.M Jul 01 '14 at 04:37
  • Even if you do `for(int i = 0; i <= 7; i++,p++)` (which I think everyone agrees **is** UB) g++ doesn't optimize it away. In other words, its UB-detection machinery is apparently not strong enough to detect it when you hide it through a pointer. (If the direction of the (quite old) proposed resolution of [CWG 232](http://www.open-std.org/jtc1/sc22/wg21/docs/cwg_active.html#232) reflects the opinion of the committee, they are fine with dereferencing null or one-past-the-end to produce an lvalue, but not with assigning to it or converting the result to an rvalue.) – T.C. Jul 01 '14 at 04:44
  • @T.C. The "proposed resolution" in that DR didn't make it into C++11 (well, not N3337 at least - the term "empty lvalue" does not appear). However the section starting "Note (March, 2005)" seems to say that even the WG is not sure whether this should be UB or not :) – M.M Jul 01 '14 at 04:52
  • Yes, it's an active DR, not closed. The April 2005 note seems to indicate that they think the general direction in the proposed resolution is fine, but the wording needed tweaking. But I guess it's not a pressing issue for them if they haven't produced a tweaked wording in 9 years :) – T.C. Jul 01 '14 at 04:55
  • @T.C. It is still a relevant case in actual programming; e.g. [accessing a 2-D array of int via a pointer to int](http://coliru.stacked-crooked.com/a/e58df6ac188592f0) . This question comes up on SO occasionally. In C, my interpretation of the standard, in accordance with the "fat pointer" model, is that both versions are UB; however it'd be well-defined to go `int *p = (int *)&foo;` and use that to iterate over the array. It would be nice if C++ had the same rules, but as we are finding, the standard may have had that intent, but if so, the text is defective. – M.M Jul 01 '14 at 05:01
1

Reading beyond the bound of an array might result in dirty read.

  1. It could be possible you may hit another array body
  2. but it could also be possible that you may hit an unallocated region or
  3. in case of int pointer you may point to a 4 byte region shared by an array of two shorts.
  4. Your pointer may try to access a region which does not belongs to your process. Fatal error!

Not recommended to go beyond the bounds.

Regards Kajal

Kajal Sinha
  • 1,565
  • 11
  • 20
1

Is it possible that:

One array's off-the-end pointer points to another array's first element?

I'm not sure by what you mean by off the end pointer. As c++ iterators use half open ranges, I'm assuming you mean the pointer that represents the end position in an iteration. As that is one past the end, yes, it might overlap a next array, and hence it may not be dereferenced.

When using pointers as iterators, addresses and not values are compared. End implies the next address beyond end.

Werner Erasmus
  • 3,988
  • 17
  • 31