What is the reason C compiler demands that number of columns in a 2d array will be defined?

Question

given the following function signature:

void readFileData(FILE* fp, double inputMatrix[][], int parameters[])

this doesn't compile.

and the corrected one:

void readFileData(FILE* fp, double inputMatrix[][NUM], int parameters[])

my question is, why does the compiler demands that number of columns will be defined when handling a 2D array in C? Is there a way to pass a 2D array to a function with an unknown dimensions?

thank you

Please add the error message, name of the compiler and version to your question. — Aaron Digulla, Aug 20 '10 at 14:20

AnT stands with Russia · Answer 1 · 2010-08-26T16:45:41.383

Built-in multi-deminsional arrays in C (and in C++) are implemented using the "index-translation" approach. That means that 2D (3D, 4D etc.) array is laid out in memory as an ordinary 1D array of sufficient size, and the access to the elements of such array is implemented through recalculating the multi-dimensional indices onto a corresponding 1D index. For example, if you define a 2D array of size M x N

double inputMatrix[M][N]

in reality, under the hood the compiler creates an array of size M * N

double inputMatrix_[M * N];

Every time you access the element of your array

inputMatrix[i][j]

the compiler translates it into

inputMatrix_[i * N + j]

As you can see, in order to perform the translation the compiler has to know N, but doesn't really need to know M. This translation formula can easily be generalized for arrays with any number of dimensions. It will involve all sizes of the multi-dimensional array except the first one. This is why every time you declare an array, you are required to specify all sizes except the first one.

Good explanation. There was a period where I used to write C code with my arrays flattened as it ran faster until I started compiling with -O2, from which the GCC did that part for me and I could start writing readable code again \o/ — Ultimate Gobblement, Aug 20 '10 at 15:53

score 5 · Answer 2 · answered Aug 20 '10 at 14:22

5

As the array in C is purely memory without any meta information about dimensions, the compiler need to know how to apply the row and column index when addressing an element of your matrix.

inputMatrix[i][j] is internally translated to something equivalent to *(inputMatrix + i * NUM + j)

and here you see that NUM is needed.

answered Aug 20 '10 at 14:22

Frank

2,628
15
14

+1: Good explanation and definitely helps to use `NUM` as the size to match the original question. – DrAl Aug 20 '10 at 14:28

score 2 · Accepted Answer · answered Aug 20 '10 at 16:34

C doesn't have any specific support for multidimensional arrays. A two-dimensional array such as double inputMatrix[N][M] is just an array of length N whose elements are arrays of length M of doubles.

There are circumstances where you can leave off the number of elements in an array type. This results in an incomplete type — a type whose storage requirements are not known. So you can declare double vector[], which is an array of unspecified size of doubles. However, you can't put objects of incomplete types in an array, because the compiler needs to know the element size when you access elements.

For example, you can write double inputMatrix[][M], which declares an array of unspecified length whose elements are arrays of length M of doubles. The compiler then knows that the address of inputMatrix[i] is i*sizeof(double[M]) bytes beyond the address of inputMatrix[0] (and therefore the address of inputMatrix[i][j] is i*sizeof(double[M])+j*sizeof(double) bytes). Note that it needs to know the value of M; this is why you can't leave off M in the declaration of inputMatrix.

A theoretical consequence of how arrays are laid out is that inputMatrix[i][j] denotes the same address as inputMatrix + M * i + j.¹

A practical consequence of this layout is that for efficient code, you should arrange your arrays so that the dimension that varies most often comes last. For example, if you have a pair of nested loops, you will make better use of the cache with for (i=0; i<N; i++) for (j=0; j<M; j++) ... than with loops nested the other way round. If you need to switch between row access and column access mid-program, it can be beneficial to transpose the matrix (which is better done block by block rather than in columns or in lines).

C89 references: §3.5.4.2 (array types), §3.3.2.1 (array subscript expressions)
C99 references: §6.7.5.2 (array types), §6.5.2.1-3 (array subscript expressions).

_{¹ Proving that this expression is well-defined is left as an exercise for the reader. Whether inputMatrix[0][M] is a valid way of accessing inputMatrix[1][0] is not so clear, though it would be extremely hard for an implementation to make a difference.}

score 1 · Answer 4 · answered Aug 20 '10 at 14:22

This is because in memory, this is just a contiguous area, a single-dimension array if you will. And to get the real offset of inputMatrix[x][y] the compiler has to calculate (x * elementsPerColumn) + y. So it needs to know elementsPerColumn and that in turn means you need to tell it.

score 1 · Answer 5 · answered Aug 20 '10 at 14:23

No, there's not. The situation's pretty simple really: what the function receives is really just a single, linear block of memory. Telling it the number of columns tells it how to translate something like block[x][y] into a linear address in the block (i.e., it needs to do something like address = row * column_count + column).

score 1 · Answer 6 · answered Aug 20 '10 at 15:40

1

Other people have explained why, but the way to pass a 2D array with unknown dimensions is to pass a pointer. The compiler demotes array parameters to pointers anyway. Just make sure it's clear what you expect in your API docs.

answered Aug 20 '10 at 15:40

nmichaels

49,466
12
107
135

What is the reason C compiler demands that number of columns in a 2d array will be defined?

6 Answers6

Linked