C Compilers -- Indirection with Multidim Arrays

Question

By definition, in every standard of C, x[y] is equivalent to (and often compiled as) *((x)+(y)). Additionally, a name of an array is converted to an address operator to it -- so if x is an array, it would be *((&(x))+(y))

So, for a multidimension array, x as a 2 dimension array, x[y][z] would be equivalent to (((&(x))+(y))+(z))

In the small scale toy C compiler I'm working on, this fails to generate proper code, because it tries to indirectly access a pointed to address at every * instruction -- this works for single dimension arrays, but for multi dimension it results in something like (in vaguely assembly pseudocode)

load &x; add y; deref; add z; deref

Where deref is an instruction to load the value at the address of the previous calculation -- as this is how the indirection operator seems to work??

However, this will generate bad code, since we should be dealing all with a single address, only dereferencing at the very end. I'm assuming there's something in the spec I'm missing?

*"name of an array is converted to an address operator to it"* No. You could say that `x` is converted to `&x[0]`, which has different type compared to `&x`. — HolyBlackCat, Oct 05 '21 at 07:15
Arrays aren't converted to pointers when used as L-values, only R-values. — Barmar, Oct 05 '21 at 07:19
What `deref` does depends on the type, and you have to detect that. Generally, yes, `deref() { if simple pointer; then deref; if array; then only remove one dimension from type and don't change the value, if pointer to function, then do nothing }` — KamilCuk, Oct 05 '21 at 07:54
Aaand there's also the case where `&*` is a no-op, so you have to check if the next operation is `&` and then do nothing, for example. — KamilCuk, Oct 05 '21 at 08:00

score 2 · Answer 1 · answered Oct 05 '21 at 07:19

2

name of an array is converted to an address operator to it

No. You could say that x is converted to &x[0], which has different type compared to &x.

Assuming you have T a[M][N];, doing a[x][y] does following:

a is converted to a temporary pointer of type T (*)[N], pointing to the first array element.
This pointer is incremented by x * sizeof(T[N]), i.e. by x * N * sizeof(T).
The pointer is dereferenced, giving you a value of type T[N].
The result is converted to a temporary pointer of type T *.
The pointer is incremented by y * sizeof(T).
Finally, the pointer is dereferenced to produce a value of type T.

Note that an array itself (multidimensional or not) doesn't store any pointers to itself. When converted to a pointer, the resulting pointer is calculated on the fly.

answered Oct 05 '21 at 07:19

HolyBlackCat

78,603
9
131
207

This is giving the same result? I think my issue might be more in what the indirection operator is supposed to output?? Should it have more complex logic to detect this? Because when T(*)[N] is dereferenced, it sees T(*[N]) as a pointer and tries to get the value at that address? – Popeye Otaku Oct 05 '21 at 07:43
1

@PopeyeOtaku You were asking "why 2 dereferences rather than 1", and the answer is "because there are 2 temporary pointers". *"Should it have more complex logic"* No, there is no complex logic here, except the fact that arrays are implicitly converted to pointers to their first element when passed to `[]`. – HolyBlackCat Oct 05 '21 at 17:03
1

*"when T(*)[N] is dereferenced, it sees T(*[N]) as a pointer and tries to get the value at that address"* The result of the dereference has type `T[N]`. When it's converted to a pointer (before the second addition), the resulting pointer is computed on the fly, rather than being taken from some memory location.. – HolyBlackCat Oct 05 '21 at 17:03

score 0 · Answer 2 · answered Oct 05 '21 at 08:31

So, for a multidimension array, x as a 2 dimension array, x[y][z] would be equivalent to (((&(x))+(y))+(z))

No, a 2D array is an array of arrays. So *((x)+(y)) gives you that array, x decays into a pointer to the first element, which is then de-referenced to give you array number y.

This array too "decays" into a pointer of the first element, so you get:

( (*((x)+(y))) + (z) )

When part of an expression, arrays always decay into a pointer to it's first element. Except for a few exceptions, namely the & address of and sizeof operators. Why typing out the & as done in your pseudo code is just confusing.

A practical example would be:

int arr[x][y];
for(size_t i=0; i<x; i++)
  for(size_t j=0; j<y; j++)
    arr[i][j] = ...

In the expression arr[i][j], the [] is just "syntactic sugar" for pointer arithmetic (see Do pointers support "array style indexing"?).
So we get *((arr)+(i)), where arr is decayed into a pointer to the type of the first element, int(*)[y].
Pointer arithmetic on that array pointer type yields array number i of type int [y].
Again, there is array decay on this one, because it too is an array part of an expression. We get a pointer to the first element, type int*.
Pointer arithmetic of the int* + j gives the address of the integer, which is then finally de-referenced to give the actual int.

score 0 · Answer 3 · edited Oct 05 '21 at 20:06

So, for a multidimension array, x as a 2 dimension array, x[y][z] would be equivalent to (((&(x))+(y))+(z))

You are mistaken. The expression x[y][z] is evaluated like:

*( *( x + y ) + z )

Here is a demonstration program:

#include <stdio.h>

int main(void) 
{
    enum { M = 3, N = 3 };
    int a[M][N] =
    {
        { 1, 2, 3 },
        { 4, 5, 6 },
        { 7, 8, 9 }
    };
    
    for ( size_t i = 0; i < M; i++ )
    {
        for ( size_t j = 0; j < N; j++ )
        {
            printf( "%d ", *( *( a + i ) + j ) );
        }
        putchar( '\n' );
    }

    return 0;
}

Its output is:

1 2 3 
4 5 6 
7 8 9

Array designators used in expressions (with rare exceptions) are implicitly converted to pointers to their first elements.

So if you have an array declared like:

int a[M][N];

then the array designator a is converted to a pointer to its first element ("row"). The type of the array element is int[N]. So a pointer to such object has the type int ( * )[N].

If you want that a pointer point to the i-th element of the array you need to write the expression a + i. Dereferencing the expression you will get the i-th row (one-dimensional array) that in turn used in expressions is converted to a pointer to its first element.

So the expression a + i has the type int ( * )[N].

The expression *( a + i ) has the type int[N] that at once is implicitly converted to a pointer of the type int * to its firs element in the enclosing expression.

The expression *( a + i ) + j points to the j-th element of the "row" of the two-dimensional array. Dereferencing the expression *( *( a + i ) + j ) you will get the j-th element of the i-th row of the array.

C Compilers -- Indirection with Multidim Arrays

3 Answers3