5

I've recently got into some pieces of code doing some questionable 2D arrays indexing operations. Considering as an example the following code sample:

int a[5][5];
a[0][20] = 3;
a[-2][15] = 4;
a[5][-3] = 5;

Are the indexing operations above subject to undefined behavior?

dragosht
  • 3,237
  • 2
  • 23
  • 32
  • 3
    There's a good duplicate of this but I can't find it , the SO search function is much worse than people's memories – M.M Aug 05 '14 at 13:44
  • 1
    Possible duplicate [here](https://stackoverflow.com/questions/6015080/c-c-is-this-undefined-behavior-2d-arrays), not sure if we should close this one, though, as the other one is not asked in a good way, additionally, accepted answer here is better... – Aconcagua Feb 07 '20 at 11:50

2 Answers2

6

It's undefined behavior, and here's why.

Multidimensional array access can be broken down into a series of single-dimensional array accesses. In other words, the expression a[i][j] can be thought of as (a[i])[j]. Quoting C11 §6.5.2.1/2:

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))).

This means the above is identical to *(*(a + i) + j). Following C11 §6.5.6/8 regarding addition of an integer and pointer (emphasis mine):

If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

In other words, if a[i] is not a valid index, the behavior is immediately undefined, even if "intuitively" a[i][j] seems in-bounds.

So, in the first case, a[0] is valid, but the following [20] is not, because the type of a[0] is int[5]. Therefore, index 20 is out of bounds.

In the second case, a[-1] is already out-of-bounds, thus already UB.

In the last case, however, the expression a[5] points to one past the last element of the array, which is valid as per §6.5.6/8:

... if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object ...

However, later in that same paragraph:

If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

So, while a[5] is a valid pointer, dereferencing it will cause undefined behavior, which is caused by the final [-3] indexing (which, is also out-of-bounds, therefore UB).

Lundin
  • 195,001
  • 40
  • 254
  • 396
Drew McGowen
  • 11,471
  • 1
  • 31
  • 57
  • "[…] because the type of `a[0]` is `int[5]` […]"—that's the part where I'm stuck. `a[0]` is subject to lvalue conversion here, so it decays to `int *`. Not sure about this… – mafso Aug 05 '14 at 13:35
  • Even though it decays to `int *`, it's still a pointer to an array (which I'm inclined to believe is considered having only 5 elements). – Drew McGowen Aug 05 '14 at 13:36
  • @mafso `a[0]` has type `int[5]` ; the decayed pointer is an rvalue (and it points into an object which is an array of 5 ints) – M.M Aug 05 '14 at 13:42
  • And `a[5]` isn't dereferenced, `a[5]-3` is (which may be valid)… The standard explicitly says that arrays are stored contiguously (with no padding and increasing addresses) _and_ explicitly states that out-of-bounds access is UB. I don't quite understand how they can go together… – mafso Aug 05 '14 at 13:42
  • 2
    @mafso actually, `a[5]-3` means that `a + 5` is dereferenced. As I mentioned, `a[5][-3]` is equivalent to `*(*(a + 5) - 3)`; the expression `*(a + 5)` is UB. – Drew McGowen Aug 05 '14 at 13:45
  • @MattMcNabb: I see your point. But this would mean, that `memcpy`ing an `int[5][5]` into an `unsigned char[25*sizeof(int)]` would also be UB. Or say, we'd have a `wordcpy` function, taking `int` pointers (but be otherwise like `memcpy`), would `wordcpy(tgt, a[0], 25)` be UB? – mafso Aug 05 '14 at 13:47
  • @mafso `memcpy(&a[0][0], x, 25*sizeof**a);` is UB, but doing `&a[0]` or `&a` is fine because in that case the pointer points to an array of that size, not to an array of smaller size – M.M Aug 05 '14 at 13:49
  • 1
    bear in mind that a pointer is allowed to also store the bounds of what it is pointing to, so that a bounds-checking implementation is legal. The bounds are determined by what object the object being pointed to is a member of. – M.M Aug 05 '14 at 13:50
-1

array indexing with negative indexes is undefined behaviour. Sorry, that a[-3] is the same as *(&a - 3) in most architectures/compilers, and accepted without warning, but the C language allows you to add negative integers to pointers, but not use negative values as array indexes. Of curse this is not even checked at runtime.

Also, there are some issues to be acquainted for when defining arrays in front to pointers. You can leave unspecified just the first subindex, and no more, like in:

int a[][3][2]; /* array of unspecified size, definition is alias of int (*a)[3][2]; */

(indeed, the above is a pointer definition, not an array, just print sizeof a)

or

int a[4][3][2]; /* array of 24 integers, size is 24*sizeof(int) */

when you do this, the way to evaluate the offset is different for arrays than for pointers, so be carefull. In case of arrays, int a[I][J][K];

&a[i][j][k] 

is placed at

&a + i*(sizeof(int)*J*K) + j*(sizeof(int)*K) + k*(sizeof(int))

but when you declare

int ***a; 

then a[i][j][k] is the same as:

*(*(*(&a+i)+j)+k), meaning you have to dereference pointer a, then add (sizeof(int **))*i to its value, then dereference again, then add (sizeof (int *))*j to that value, then dereference it, and add (sizeof(int))*k to that value to get the exact address of the data.

BR

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31
  • `int a[][3][2];` is illegal. You must either specify the first dimension, or give an initializer from which the first dimension is calculated. It's not a "pointer alias". You may be getting mixed up with the meaning of an array declarator [in a function parameter list](http://stackoverflow.com/q/22677415/1505939), but in that case `int a[4][3][2]` is also `int (*a)[3][2]`. – M.M Feb 24 '15 at 20:17
  • in `&a + i * (sizeof`... you meant `(char *)&a` ; pointer arithmetic is done in terms of the size of the object being pointed to – M.M Feb 24 '15 at 20:18
  • `a[i][j][k]` is the same as `*(*(*(a+i)+j)+k)` (note the lack of `&`) – M.M Feb 24 '15 at 20:18