6

From Take the address of a one-past-the-end array element via subscript: legal by the C++ Standard or not?

It seems that there is language specific to taking the address of one more than an array end.

Why would 2 or 2,000,000 past the end be an issue if it's not derefferenced?

Looking at some simple loop:

int array[];
...
for (int i = 0: i < array_max; ++i)
{
       int * x = &array[i *2];      // Is this legal
       int y=0;
       if (i * 2 < array_max)       // We check here before dereference
       {
              y = *x;               // Legal dereference
       }
       ...
}

Why or at what point does this become undefined, in practice it just sets a ptr to some value, why would it be undefined if it's not refferenced?

More specifically - what example of anything but what is expected to happen could there be?

Community
  • 1
  • 1
Glenn Teitelbaum
  • 10,108
  • 3
  • 36
  • 80
  • 1
    Having a pointer pointing there is UB. – chris Jan 01 '14 at 22:33
  • No it's not legal. Even though it should work. – user2345215 Jan 01 '14 at 22:34
  • 2
    Because it may not behave properly with respect to comparison or subtraction. For example `(p + 2000000000) - p` is probably not going to equal 2000000000. – Raymond Chen Jan 01 '14 at 22:34
  • @RaymondChen - in this example, there is nothing but assignment and then qualified dereferrencing - the value is only used if validated, the question is why is the setting of the value an issue? – Glenn Teitelbaum Jan 01 '14 at 22:38
  • Is just having a pointer to 2 past end of array undefined without dereferencing or using it in addtion, for example? – TimDave Jan 01 '14 at 22:45
  • The most you can hope for is that `&arr[size+1]` is just equivalent to `arr + size + 1`, but even that is UB. Simply having a pointer outside of the array (not including one past the end) is UB. Technically, the UB comes from any arithmetic resulting in any such pointer, so even `arr - 2 + 2` is UB, but that just goes with the first part of this comment. – chris Jan 01 '14 at 22:53
  • Remember that working exactly as you'd expect falls within the realm of undefined behavior. Unfortunately so do many other things, and by definition it's not predictable which you'll get. – Mark Ransom Jan 01 '14 at 23:18
  • @MarkRansom If it always does what is expected, it cannot be called undefined, there must be some non predictable result, which was waht I was looking for – Glenn Teitelbaum Jan 01 '14 at 23:43
  • @GlennTeitelbaum, I never said "always", in fact I was saying just the opposite. – Mark Ransom Jan 02 '14 at 00:22

3 Answers3

8

The key issue with taking addresses beyond the end of an array are segmented architectures: you may overflow the representable range of the pointer. The existing rule already creates some level of pain as it means that the last object can't be right on the boundary of a segment. however, the ability to form this address was well established.

Dietmar Kühl
  • 150,225
  • 13
  • 225
  • 380
  • Why is it an issue even without dereferencing? – zneak Jan 01 '14 at 22:47
  • 1
    @LightnessRacesinOrbit: Those do not explain why an "end-of-object" pointer past a segment boundary is a problem. – MSalters Jan 02 '14 at 08:24
  • @zneak: With segmented pointers, pointer comparisons are done withing a single segment. If an object ends on a segment boundary, the pointer after it does not point to the same segment and therefore cannot be compared. – MSalters Jan 02 '14 at 08:26
4

Since array[i *2] is equivalent to *((array) + (i*2)), we should look at the rules for pointer addition. C++11 §5.7 says:

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

So you have undefined behaviour even if you don't perform indirection on the pointer (not to mention that you do perform indirection, due to the expression equivalence I gave at the beginning).

Joseph Mansfield
  • 108,238
  • 20
  • 242
  • 324
  • I think that under the new C++11 rules, neither expression performs indirection. They're neither reads not writes. – MSalters Jan 02 '14 at 08:31
  • @MSalters: Why's that? – Lightness Races in Orbit Jan 02 '14 at 16:23
  • @LightnessRacesinOrbit IIRC to legalize `&array[last]` – MSalters Jan 02 '14 at 16:34
  • @MSalters: Yes, that's what the quote in the answer you're commenting on says, is it not? The key is that this question is about `&array[last+1]`, not `&array[last]`. – Lightness Races in Orbit Jan 02 '14 at 16:36
  • @LightnessRacesinOrbit: Entirely aware of that, but the point is that your last sentence assumes that indirection happens, in both forms. My statement is that neither syntax performs indirection. They both just form an address, and only subsequent use of the designated object may trigger indirection. – MSalters Jan 02 '14 at 17:01
  • @MSalters: Okay and I'm asking for more details :P 5.3.1/1 still says unary `*` performs indirection. And the standard still doesn't say that the `&` in `&*` magically "cancels out" the semantics of the `*`, contrary to popular belief. – Lightness Races in Orbit Jan 02 '14 at 17:01
  • @LightnessRacesinOrbit: Again from memory, this was part of cleaning up the whole memory access description for C++11. Especially with multi-threading you need a much clearer definition of what constitutes a read and/or a write. Merely forming a pointer should not, which meant that the new wording of 5.7 had to be introduced to properly handle this `&[last]` corner-case. (It was always a bit unfortunate that `&a[i]==a+i` except for i==N.) – MSalters Jan 02 '14 at 17:10
  • @MSalters Regardless, I just cited the passage that indicates `*` is still indirection. – Lightness Races in Orbit Jan 02 '14 at 17:25
3

in practice it just sets a ptr to some value

In theory, just having a pointer that points somewhere invalid is not allowed.

Pointers are not integers: they are things that point to other things, or to nullity.

You can't just set them to whatever number you like.

in this example, there is nothing but assignment and then qualified dereferrencing - the value is only used if validated, the question is why is the setting of the value an issue?

Yeah, you'd have to be pretty unlucky to run into practical consequences of doing that. "Undefined behaviour" does not mean "always crash". Why should the standard actually mandate semantics for such an operation? What do you think such semantics should be?

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
  • 1
    But you can cast them to an integer (or rather uintptr_t), do whatever you want to do with them and cast them back (if they are valid). – user2345215 Jan 01 '14 at 22:39
  • 1
    @user2345215: Yes, just like you can cast them to `ChickenSoup*` with a `reinterpret_cast`. That doesn't change what they are intended to be, which is what the rules are based on. – Lightness Races in Orbit Jan 01 '14 at 22:40