6

In an algorithm I'm writing, I can have the following (simplified of course)

int a[3] = {1,2,3};
int b = a[3];

when the index used to fill b overflows, I never use the value of b. Is the code still incorrect? Do I have to make an explicit boundary check?

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
static_rtti
  • 53,760
  • 47
  • 136
  • 192
  • 1
    The answer is already given, but I can highly recommend this series of blog posts concerning undefined behavior in C: ([part 1](http://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html), [part 2](http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_14.html), [part 3](http://blog.llvm.org/2011/05/what-every-c-programmer-should-know_21.html)). It also addresses situations where you can get bitten badly by relying on this kind of undefined behavior (with a motivating example from the linux kernel (in part 2)). – user786653 Jul 04 '11 at 15:40

6 Answers6

5

This code has Undefined Behavior whether or not you use b. Why? Because a[3] is equivalent to *(a+3) by definition. And here's a quote from the standard that proved that *(a+3) is in itself undefined, regardless of whether the value is stored, used, or left alone.

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

Armen Tsirunyan
  • 130,161
  • 59
  • 324
  • 434
  • @Kerrek: Woops, read that as "is the code still **correct**?". – Xeo Jul 04 '11 at 14:46
  • "or one past the last element of the array object, the evaluation shall not produce an overflow" --> does this mean the code is correct if the index is equal to the size of the array? – static_rtti Jul 04 '11 at 14:55
  • 2
    @static_rtti: Yes, but only if you do it with pointer arithmetic (`(a + 3)`) and you shall not dereference the pointer you get (which automatically happens with `a[3]`). – Xeo Jul 04 '11 at 21:10
3

Still incorrect, still undefined behaviour. Do the bounds check.

int b = *(a + 3); // dereferencing beyond the array bound.
Xeo
  • 129,499
  • 52
  • 291
  • 397
3

Reading a[3] already causes undefined behaviour. As undefined behaviour is never locally limited, this can already lead to your harddisk drive being formatted or your computer emerging to a giant, flesh-eating zombie.

In reality, it will usually just work. But it's easy to make up a case where the end of the array marks the end of a mapped memory region, so accessing one element beyond would cause a segmentation fault. This is certainly not the case for an array of int on the stack and neither with most heap implementations, but you shouldn't rely on it.

(Whether taking the address of &a[3] is undefined behaviour as well is heavily disputed.)

Community
  • 1
  • 1
Alexander Gessler
  • 45,603
  • 7
  • 82
  • 122
  • Why is the validity of `&a[3]` disputed? It's the same as `a+3` which is explicitly allowed. Or are you talking about the redundant `&*` at the beginning being a "dereference"? – R.. GitHub STOP HELPING ICE Jul 04 '11 at 14:57
  • @R: If you follow the link, you can read the dispute for yourself. – Lightness Races in Orbit Jul 04 '11 at 15:01
  • R..: I think that's one part, the other is IIRC that in C++, pointers to memory areas not allocated by means of the language are invalid. – Alexander Gessler Jul 04 '11 at 15:01
  • 1
    If we're talking about C at least, it should not at all be heavily disputed. &a[3] is perfectly legal because using & on an [] operand does not derefernce anything: "Similarly, if the operand is the result of a [] operator, neither the & operator nor the unary * that is implied by the [] is evaluated and the result is as if the & operator were removed and the [] operator were changed to a + operator" – nos Jul 04 '11 at 21:20
2

It's still incorrect yes, because you access the out-of-bounds memory location to get the value a[3] and store it in the variable b.

The fact that you never use b could mean that the compiler optimizes out that line of code, so you might not ever see any adverse effects from that line being there.

However, the compiler is not required to do so, and the code itself still has undefined behavior.

Sander De Dycker
  • 16,053
  • 1
  • 35
  • 40
1

Yes.

You are using the value, by copying it into b.

More specifically, dereferencing (a+3) is not allowed since the expression (a+3) is not a valid pointer... and the expression a[3] is equivalent to *(a+3) (where a has decayed to a pointer-expression).

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
0

Yes, it is wrong to read a[3] which doesn't exits.

Using b would be wrong too, but it is already too late.

Bo Persson
  • 90,663
  • 31
  • 146
  • 203
  • 2
    `b` is just an int, what's the harm in using it? It has indeterminate value, sure, but it's not memory-incorrect to use it... – Kerrek SB Jul 04 '11 at 14:45
  • @Kerrek SB `int`s are allowed to have trapping values (and do on some implementations). – James Kanze Jul 04 '11 at 14:46
  • 2
    @Kerrek - You cannot read an indeterminate value, it might trap. It doesn't matter here either, as it is already UB before getting to read `b`. – Bo Persson Jul 04 '11 at 14:46
  • @James: What's a trapping value, and how does that affect the program flow? – Kerrek SB Jul 04 '11 at 14:47
  • The actual violation happens before indeterminate values come into play. It's an array bounds violation, and even if you somehow knew there was valid memory after the array (e.g. if it came from `malloc` or a struct), it's then an aliasing violation too if the data at that location is not part of the array of type `int[3]`. – R.. GitHub STOP HELPING ICE Jul 04 '11 at 15:01
  • @Kerrek SB A trapping value is a bit pattern which causes some sort of hardware reaction *when read*, typically a trap, which results in the OS taking over and doing something, typically aborting the program with prejudice. Thus, a one's complement machine might arrange for all 0 results to be a positive 0, and trap if you read a negative 0. – James Kanze Jul 04 '11 at 15:21
  • @James: Thanks! But would the trap have to happen when reading `a[3]` (suppose it points to a trapping value), or only when later reading `b`? – Kerrek SB Jul 04 '11 at 15:23
  • @Kerrek The access is undefined behavior; there's no guarantee of a trap on any machine. But it could occur on just a read, before you do anything with the value. – James Kanze Jul 04 '11 at 17:29