7

Pardon the confusing question title, but I was unsure how to phrase it more clearly.

In C, accessing an array out of bounds is classified as undefined behavior. However, array elements are guaranteed to be laid out contiguously in memory, and the array subscript operator is syntactic sugar for pointer arithmetic (e.g x[3] == *(x + 3)). Therefore, I would personally expect the behavior of the code below to be well-defined:

int array[10][10];
int i = array[0][15]; // i == array[1][5]?

If my interpretation of the standard is correct, this would be undefined behavior. Am I wrong?

hiy
  • 449
  • 5
  • 15
  • 1
    Yes, you are correct. That is _not_ UB. And, your equivalence is also correct. It _would_ be UB, if you did (e.g.) `array[9][15]`. So, I think your understanding is pretty good. In fact, particularly for accessing image arrays, it's common the to use a 1D access: `int *ptr = array; int *eptr = &array[10][10]; for (; ptr < eptr; ++ptr) i = *ptr;` – Craig Estey Jun 07 '20 at 18:17
  • 5
    Consider a machine with a restrictive addressing scheme, such as an early computer with some form of segment-offset addressing. A C implementation for such a machine might support an array of type `x[256][256]` by using segment base addresses for the `x[i]` arrays and offsets from those bases to access the `x[i][j]` elements. But attempting to access `x[i][j]` with `x[i-1][j+256]` might fail because the offset calculation for `j+256` would wrap the offset field, resulting in an address different from that of `x[i][j]`. – Eric Postpischil Jun 07 '20 at 18:33
  • Does this answer your question? [Multidimensional array out of bound access](https://stackoverflow.com/questions/48219108/multidimensional-array-out-of-bound-access) –  Jun 07 '20 at 18:49
  • 1
    @StaceyGirl to be honest I don't understand that question very well (and the answers here seem a lot better) – hiy Jun 08 '20 at 17:28
  • @EricPostpischil wow, that's literally the exact thing I was working on when this question popped up! Thanks for your thoughtful reply – hiy Jun 08 '20 at 18:26

3 Answers3

9

According to the standard, it is clearly undefined behaviour as such a case is explicitly listed in the section J.2 undefined behaviour (found in an online C99 standard draft):

An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).

It can still be the case that your example will work, and actually I have seen a lot of such cases in C code; However, to be accurate, it is UB.

Stephan Lechner
  • 34,891
  • 4
  • 35
  • 58
  • @StaceyGirl: Still the same; taking the address of an object will not change the type of the object and the restrictions connected to it. – Stephan Lechner Jun 07 '20 at 18:48
  • What about `memset`'ing the `unsigned char` array? It does exactly this (lets assume we implement `memset` by hand). Then `(&a[0][0])[15]` should work - exactly what `memset` might do. –  Jun 07 '20 at 18:54
  • 1
    @StaceyGirl: I'm rather sure that it will work for 99,99% or even 100% of current compilers / architectures. But not everything, even if it is commonly agreed to work, is actually backed by the standard. – Stephan Lechner Jun 07 '20 at 19:06
  • @StaceyGirl The source code to `memset()` does not have to obey the rules of C - it does not even have to be in C. The source code can cheat per the implementation. – chux - Reinstate Monica Jun 07 '20 at 19:55
  • @StaceyGirl `char`s are special. The standard allows inspecting any object as an array of character type (http://port70.net/~nsz/c/c99/n1256.html#6.2.6.1p4, http://port70.net/~nsz/c/c99/n1256.html#6.5p6). That, AFAIK, allows functions like `memset` or macros like linux's `container_of`. Unfortunately treating a `int[n][m]` array as an array of `int[n*m]` and the related internally out-of-bounds pointer arithmetic (on non-char pointers) don't seem to be allowed (nor explicitly prohibited either given that appendices aren't normative?). – Petr Skocik Jun 08 '20 at 08:47
  • @PSkocik But `container_of` only does pointer arithmetic - it doesn't inspect anything. If you can "magically" create pointers out of nowhere as `container_of` does, the same trick should work with `(&a[0][0])[15]` - technically this is the same thing. This seems more like another flaw in the standard. –  Jun 08 '20 at 08:59
  • @StaceyGirl I myself would rather be able to do `(&int_a_10_10[0][0])[15]` or `int_a_10_10[0][15]` but it seems the only guaranteed way is `*(int*)((char*)int_a_10_10+sizeof(int)*15)`, i.e., treat the object as a `char [sizeof(int_a_10_10)]` in order to do the pointer arithmetic (like `container_of` would), then cast the result to `int*` and use it for access, which is OK, given that there truly is an `int` at that spot. – Petr Skocik Jun 08 '20 at 09:18
  • 1
    @PSkocik I think `int*` and `char*` arithmetics should be equivalent here because pointer representation is somewhat well defined - you should be able to do similar trick with `uintptr_t`. Although I wouldn't be surprised if there are some rules that say that `uintptr_t` cannot be modified... I feel @supercat's answer is spot on here. –  Jun 08 '20 at 09:22
  • 2
    @StaceyGirl Multiple parts of the standard care about how you derive your pointers, not just that the representations are the same and the math is valid. It sometimes buys better alias analysis. I'm not sure what it buys with multidimensional array accesses unless you're on some exotic/segmented architecture like in Eric Postpischil's example. Overall, I think the standard could do a much better job at explaining its motives and rationales. Doing your pointer arithmetic in `uintptr_t` effectively sidesteps these issues, but then you're in implementation-defined realm. – Petr Skocik Jun 08 '20 at 09:59
  • @PSkocik: In many cases, it is useful for compilers to know that two lvalues that double-subscript an array won't access the same element unless both subscripts match. This allows compilers to perform useful optimizations like replacing up-counting loops with down-counting loops, using vector instructions, merging or separating loops, etc. The problem is that the Standard provides no means of converting the address of a two-dimensional array into a pointer that provides linear access to all the elements. – supercat Feb 02 '21 at 18:40
5

The Standard makes very clear that given unsigned char arr[10][10];, an attempt to access arr[0][x] would yield UB if x exceeds 9.

I think it is equally clear, however, that the authors of the Standard intended to allow code to take the address of any object, including a multi-dimensional array, as a character pointer, and then index that pointer to access all the bytes of the object.

If the Standard were to say that the arr[0] yields a pointer of type char* which can only be used to access the first ten elements, but (char*)arr would yield a pointer that can access the entire array, that would accommodate both objectives above, but I see nothing in the Standard that would suggest that arr[0] and (char*)arr are not equivalent to each other.

Most likely, the authors of the Standard expected that implementations would seek to behave sensibly in such corner cases whether or not the Standard described them fully. I'm not sure whether clang and gcc conform to such expectations with regard to this particular issue, but such expectations don't hold true in general.

supercat
  • 77,689
  • 9
  • 166
  • 211
1

As other answers already pointed out where the standard says that this is UB, I will add something that is not discussed in other answers.

The expression arr[0][15] is equivalent to (a[0])[15] due to operator precedence.

Now since going out of bounds of a 1D arrays is undefined behavior, the use of the expression arr[0][15] which is the same as (arr[0])[15] is also UB because 15 is greater than 10 in your example.

Basically, since going out of bounds of a 1D array is undefined behavior, this implies that if any of the index i or j in the expression (arr[i])[j]== arr[i][j] goes beyond range xrange - 1 = 10 - 1 = 9 and yrange - 1 = 10 - 1 = 9(for your given example) respectively, we will have undefined behavior.

Note xrange and yrange are shorthand for the initial size that you specified as rows and columns of the array which correspond to 10 and 10 in your example.

Jason
  • 36,170
  • 5
  • 26
  • 60
  • 1
    A complication is that if a struct is defined as e.g. `struct foo {int x, y; char z[16];};` and one has a pointer to `z`, the Standard clearly intends that it be possible to take `z`'s address as a character pointer, subtract `offsetof(struct foo, z)` from it, and use the result as a pointer to the structure (indeed, that's about the only thing the `offsetof` macro can be used for that couldn't be done better some other way). I am unaware of anything in the Standard, however, that would distinguish been a pointer to an element within `z` from a pointer to a byte within the enclosing struct. – supercat Aug 22 '22 at 16:53
  • 1
    If the Standard had specified that an array operand to `[]` will not decompose into a pointer but instead use a form of the operator that fetches the indicated element from the array, then it could specify that such a form may only be used with items that are within bounds of the innermost array, while allowing other forms of pointer arithmetic to operate on an enclosing object. – supercat Aug 22 '22 at 16:58