6

I have a question about the memory layout of a two-dimensional array. When we define one, just like int a[3][4], is the memory allocated to this array continuous?

Or in other words, is a two-dimensional array implemented as a continuous one-dimensional array?

If the answer is yes, is accessing a[0][6] equivalent to accessing a[1][2]?

I wrote the following C program.

#include <stdio.h>
int main(){
    int a[3][4] = {{1, 2, 3, 4},
                   {5, 6, 7, 8},
                   {9, 10, 11, 12}};
    printf("%d %d\n", a[0][6], a[1][2]);
    return 0;
}

I find that the output is 7 7.

a[0][6] seems illegal, but it points to a[1][2], I want to know why and is such operation legal?

TH讠NK
  • 79
  • 3
  • 10
    It turns out this is a grey area. Yes, all arrays in C are contiguous, so a two-dimensional array is definitely a contiguous array of one-dimensional arrays. However, while it seems this means that `a[0][6]` would be equivalent to `a[1][2]`, it has been determined that, per the standard, trying to access `a[0][6]` is indeed undefined. It might seem to work (as indeed it worked for you), but it's not guaranteed to work. – Steve Summit Jan 04 '23 at 02:06
  • Is the inverse undefined? Can you access a lengthy 1-D array as a 2D (or more) array? Can a 12-element array reliably be treated as a 2x6, or a 3x4 etc? My instinct says yes, but its not necessarily obvious. – abelenky Jan 04 '23 at 02:41
  • @abelenky not without explicitly casting, at a minimum, since otherwise there would be no way for the indexing operation to know what strides to use. – Karl Knechtel Jan 04 '23 at 03:26
  • @abelenky Without casting and pointers (so we are no longer using _array_ access), to get access via a 1D array and a 2D array (or a different 2D array) could occur if arrays were `union` members. – chux - Reinstate Monica Jan 04 '23 at 04:57

3 Answers3

8

This is as interesting case. Section 6.2.5p20 of the C standard defines an array as follows:

An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type. The element type shall be complete whenever the array type is specified. Array types are characterized by their element type and by the number of elements in the array. An array type is said to be derived from its element type, and if its element type is T , the array type is sometimes called ‘‘array of T ’’. The construction of an array type from an element type is called ‘‘array type derivation’’

So an array is a contiguous set of objects of a particular type. In the case of int a[3][4] it is an array of size 3 whose objects are also arrays. The type of the subarray is int [4], i.e. an array of size 4 of type int.

This means that a 2D array, or more accurately an array of arrays, does indeed have all of the individual members of the inner array laid out continuously.

This does not however mean that the above array can be accessed as a[0][6] as in your example.

Section 6.5.2.1p2 regarding the array subscript operator [] states:

The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2)))

And section 6.5.6p8 regarding the + operator when applied to a pointer operand states:

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

There's a lot to ingest here, but the important part is that given an array of size X, the valid array indices range from 0 to X-1, and attempting to use any other index triggers undefined behavior. In particular, since a[0] has type int [4], attempting to access a[0][6] is going outside the bounds of the array a[0].

So while a[0][6] in practice would probably work, the C standard makes no guarantee that it will. And given that modern optimizing compilers will aggressively assume undefined behavior doesn't exist in a program and optimize based on that fact, you could end up in a situation where something goes wrong and you don't know why.

To summarize: yes 2D arrays are implemented that way, and no you can't access them like that.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • Very nice write up. I like the "given that modern optimizing compilers ..." and "... and no you can't access them like that". [Here be dragons](https://en.wikipedia.org/wiki/Here_be_dragons). – chux - Reinstate Monica Jan 04 '23 at 04:54
  • _This is as interesting case_ And whats more important — never considered before. – Language Lawyer Jan 04 '23 at 22:14
  • @LanguageLawyer Re: "never considered before": in C11, J.2 there is "(as in the lvalue expression a[1][7] given the declaration int a[4][5])". – pmor Feb 27 '23 at 13:40
  • @pmor this is called «sarcasm». There are dozens of similar questions here, and instead of closing as a dup, ppl continue to answer them ‍♂️ – Language Lawyer Feb 27 '23 at 13:52
4

@dbush has written a good, correct answer explaining what's guaranteed and allowed. The TL;DR is basically: yes it is contiguous but still no guarantees are made that allow you to reliably access any (sub) array out of bounds. Any pointer to an item needs to point within a valid array in order to use pointer arithmetic or the [] operator on it.

This answer adds some possible work-arounds to solve the problem.

One work-around is to use a union and "type pun" between a 2D array and a 1D array:

#include <stdio.h> 

typedef union
{
  int arr_2d [3][4];
  int arr_1d [3*4];
} arr_t;

int main (void)
{
  arr_t a = 
  { 
    .arr_2d =  
    {
      { 1, 2, 3, 4},
      { 5, 6, 7, 8},
      {9, 10, 11, 12}
    }
  };

  printf("%d %d\n", a.arr_1d[6], a.arr_1d[(1*4)+2]); // well-defined, union type punning
  return 0;
}

Another universal work-around is that it's fine to inspect any variable in C as a chunk of raw data by using character types, in which case you can regard the whole variable as an array of bytes:

#include <stdio.h>
int main (void){
    int a[3][4] = {{1, 2, 3, 4},
                   {5, 6, 7, 8},
                   {9, 10, 11, 12}};
    unsigned char* ptr = (unsigned char*)a;

    // well-defined: byte-wise pointer arithmetic on a character type
    // well-defined: dereferencing as int, since the effective type of the memory location is int:
    int x = *(int*)&ptr[sizeof(int)*6];
    int y = *(int*)&ptr[sizeof(int[4])*1 + sizeof(int)*2];

    printf("%d %d\n", x, y);
    return 0;
}

Or in case you are a fan of obscure macros, a rewrite of the previous example (not really recommended):

#include <stdio.h>

#define GET_ITEM(arr, x, y) \ // see 1)
  _Generic( **(arr),        \
            int:  *(int*) &((unsigned char*)(arr))[sizeof(*arr)*(x) + sizeof(int)*(y)] ) 

int main (void){
    int a[3][4] = {{1, 2, 3, 4},
                   {5, 6, 7, 8},
                   {9, 10, 11, 12}};
    unsigned char* ptr = (unsigned char*)a;

    // well-defined:
    printf("%d %d\n", GET_ITEM(a,0,6), GET_ITEM(a,1,2));
    return 0;
}

1) Explanation: _Generic for type safety. Cast to a character type, do byte-wise pointer arithmetic based on the size of a 1D array of int for x and the size of an int for y. Precedence goes as [] over &. The & is to get an address, then cast to an int* and dereference. The value will be returned by the macro.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • If I understand correctly, in your second method, you cast a(type is `int (*)[4]`) into a `char*`, and the array is treated as a byte stream. Then the pointer could move freely on it and point to `a[1][2]` without any undefined behavior. Great solution! – TH讠NK Jan 04 '23 at 10:29
  • 2
    @TH讠NK Yes it is allowed as per a special rule for character pointers, C17 6.3.2.3 "When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object." – Lundin Jan 04 '23 at 10:31
  • There is a good question whether `int *flat = &a;` could be used for linear access as it is allowed for `char` types. – tstanisl Jan 04 '23 at 13:11
  • 1
    @tstanisl It isn't allowed since the special rule above is for character types only. You cannot do pointer arithmetic+dereference using a `int*` beyond the first `int[4]` array. And well, `int*` isn't compatible with `int(*)[3][4]` nor with `int(*)[4]` so that assignment is a constraint violation. – Lundin Jan 04 '23 at 13:49
  • @Lundin, it would apply for `int *flat = a[0];` or `int *flat = &a[0][0];`. No "identity" of `int[4]` object at `a[0]` is transported to `flat` in `int *flat = (int*)&a;`. If it was a case then `int (*a2)[3][4] = (void*)(int*)&a;` would be illegal. – tstanisl Jan 04 '23 at 14:14
  • @tstanisl Doesn't matter. As soon as you use `[]` or pointer arithmetic, you invoke the binary + operator and as quoted in dbush's answer: "If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.". _You cannot use pointer arithmetic in C unless the pointer points at an array._ If you cast to `uintptr_t` and do all arithmetic on integer types, well that's another story. Then it's not necessarily UB but just impl.defined. – Lundin Jan 04 '23 at 14:18
  • @Lundin, maybe, but the C standard is terribly confusing here. Especially if there are multiple arrays overlapping. Like `union { int x[10]; int y[2]; } u;`. Now does `7 + (int*)&u` invoke UB or not? It overflows `u.y` while being within `u.x`. – tstanisl Jan 04 '23 at 14:26
  • @tstanisl That example is only confusing because of the union pointer/member equivalence rule, which is another story. Your pointer `(int*)&u` points at _each_ of the members `x` and `y`. And after they had decided on that part, the C committee took a lunch break, to address other matters after lunch... – Lundin Jan 04 '23 at 14:36
  • @Lundin First of all great well-reasoned answer. Any reason why you use an unchanged char opposed to char? Both options introduce accidental complexity via unrelated types. – Allan Wind Jan 10 '23 at 09:27
  • @AllanWind Because `char` has implementation-defined signedness, may be negative and is generally problematic and non-portable. Good practice is to never use `char` for anything but actual text strings, as well as always using unsigned types when dealing with raw binary data. – Lundin Jan 10 '23 at 09:30
  • @Lundin 1. You are only using the char pointer not the char value. I can't think of a a gotcha. I get the implementation defined signedness(?). 2. I assume the pointer version `*((int *) &a[0]0] + 6)` has the same UB as `a[6]`? Is the argument here that two forms are equivalent syntax? – Allan Wind Jan 10 '23 at 09:37
  • @AllanWind Code on SO should strive for correctness rather than teaching bad practices. And also, I don't write brittle code just for the heck of it. What if someone decides to dereference the data during maintenance? Regarding the different question, your two examples mean completely different things. – Lundin Jan 10 '23 at 09:41
  • @Lundin Totally agree with correctness etc. I was genuinely interested re those two question... seems that I rubbed you the wrong way, and I certainly didn't intend to. – Allan Wind Jan 10 '23 at 09:46
0

Back in the wild days, before compilers got serious about emitting diagnostics for doing (very likely) stupid things, this type of code was not uncommon.

NOTE: DO NOT DO THIS

#include <stdio.h>

int main(int argc, char* argv[]) {
    int a[3][4] = {{1, 2, 3, 4},
                   {5, 6, 7, 8},
                   {9, 10, 11, 12}};

    // treat the array as a single, contiguous arrangement
    // will throw compiler warning/error
    int *p = &a; // <=== VERY BAD, invalid pointer assignment!!

    printf("%d %d\n", p[6], a[1][2]);
    return 0;
}

This demonstrates that the array is in fact laid out contiguously in row-major order. This is one of those things about C that if you really want it to, it will let you shoot yourself right in the foot.

Mark Benningfield
  • 2,800
  • 9
  • 31
  • 31
  • I want to know the type of `&a`, `int (*)[3][4]`? – TH讠NK Jan 04 '23 at 10:56
  • `int (*)[3][4]` means "pointer to integer array of dimensions [3][4]" – Mark Benningfield Jan 04 '23 at 11:01
  • What you said is right. I wrote `int (*p)[3][4] = &a;` and it’s successfully compiled. More specifically, `(*p)[1][2]` is equivalent `a[1][2]` in my experiment, so I think the type of `&a` is `int (*)[3][4]`. – TH讠NK Jan 04 '23 at 11:12
  • The fun part is that the code would be **defined** if `char` was used instead of `int` – tstanisl Jan 04 '23 at 13:09
  • I don't think it is an aliasing violation. The strict aliasing rule applies only to **l-value** expressions while `a[1]` is not an l-value. It is automatically transformed to a pointer. So `int` is accessed as `int` l-value in both `p[6]` and `a[1][2]`. – tstanisl Jan 04 '23 at 13:12
  • 1
    "Back in the wild days" how far back exactly, and how are such nostalgia trips to pre-standardized K&R 1st edition C of interest for the OP? Because this hasn't been valid C since at least year 1989. It was never an aliasing violation but a constraint violation of assignment. Strict aliasing didn't really get formalized/messed up until C99, but this code was already invalid C long before that. – Lundin Jan 04 '23 at 13:55