4

I have a some confusion when it comes to multidimensional arrays. The question that came closest to helping me in my understanding was this post

Pointer address in a C multidimensional array

I have a multidimensional array initialized as follows int zippo[4][2] = {{2, 4}, {6, 8}, {1, 3}, {5, 7}};

When I print the variables zippo and *zippo it shows the same memory address for both, but when I print **zippo it prints 2 (the first value in the first subarray). My question is how does the compiler know that when zippo is dereferenced twice to print the first value of the first array? For example, for the sake of simplicity, if the memory address of zippo is 30 and the value of zippo and *zippo is 15, then you should have the following representation in memory?

memory addresses

It is my understanding that *zippo goes to memory location 15 to find the value at the location, which just so happens to be 15. So, shouldn't dereferencing it another time cause 15 to be printed yet again?

Community
  • 1
  • 1
BlaqICE
  • 309
  • 2
  • 11
  • There are already hundreds of questions about jagged arrays (something like `int **`). Why do you expect a completely different type can be used?? If you have an `int`, you cannot `printf` a `_Complex`! – too honest for this site Dec 26 '16 at 04:05

3 Answers3

1

You're thinking too low-level. Your question pertains to variable names and types (at the language level).

When you declare int zippo[4][2] = {{2, 4}, {6, 8}, {1, 3}, {5, 7}};, you end up with an array of four arrays of two ints. They can be accessed in many ways, depending on what you need to express. Here are some of the sub-objects involved:

| 2 | 4 | 6 | 8 | 1 | 3 | 5 | 7 |    The storage for zippo: 8 contiguous ints

|<--+---+---+---+---+---+---+-->|    zippo (the whole array), is an int[4][2]
|<--+-->|                            zippo[0] (also known as *zippo) is an int[2]
                |<--+-->|            zippo[2] is also an int[2]
|<->|                                zippo[0][0] (also known as **zippo) is an int
            |<->|                    zippo[1][1] is also an int

You can see that these sub-objects can overlap, and in some case share addresses. What still makes them distincts objects (for you, the language, and the compiler) is their type.

For example, zippo[0] and zippo[0][0] (which is its first half) have the same address, but one of them is an int, while the other is an array of two ints.

That is why you can't keep indexing into zippo[0][0], or try to use zippo[0] inside an integer calculation: even though they share the same storage, they're different objects with different meanings.

And even though indexing into arrays involves pointer arithmetic, there is no actual chain of pointers, no int*** that your first understanding implies. It's all variable names.

Quentin
  • 62,093
  • 7
  • 131
  • 191
0

No *zippo do not go to location 15 then find the value at this location. If it was so then printf(" * ((int **) zippo) = %p\n", * ((int **) zippo) ); would output the same thing as printf(" *zippo = %p\n", *zippo); and that is not the case.

When I run this code this is what I obtain :

#include <stdio.h>

int zippo[4][2] = {{2, 4}, {6, 8}, {1, 3}, {5, 7}};

int main(){
    printf("zippo[0] = %p\n", (void *) (zippo[0]) );
    printf("  zippo = %p\n", (void *) zippo);
    printf("  *zippo = %p\n", (void *) (*zippo) );
    printf("  **zippo = %d\n", (int) ( **zippo) );
    printf("  * ((int **) zippo)  = %p\n", (void *) (* ((int **) zippo) ));
}

This is what I obtain :

zippo = 0x804a040
*zippo = 0x804a040
**zippo = 2
* ((int **) zippo)  = 0x2

I compiled this code using gcc -Wall -Wextra -Wpedantic -pedantic to ensure that no warning is hidden and the option -m32 to have 32 bits adresses (same size as int).

I actually had a hard time understanding what is happening in there so I decided to have a look at the corresponding assembly code. using gcc -S file.c -o file.s I obtain the following.

First variable declaration :

    .globl  zippo
    .data
    .align 32
    .type   zippo, @object
    .size   zippo, 32
zippo:
    .long   2
    .long   4
    .long   6
    .long   8
    .long   1
    .long   3
    .long   5
    .long   7
    .section    .rodata
.LC0:
    .string "zippo[0] = %p\n"
.LC1:
    .string "  zippo = %p\n"
.LC2:
    .string "  *zippo = %p\n"
.LC3:
    .string "  **zippo = %d\n"
.LC4:
    .string "  * ((int **) zippo)  = %p\n"

The correcsponding assembly for printf("zippo[0] = %p\n", (void *) (zippo[0]) ); :

movl    $zippo, %esi
movl    $.LC0, %edi
movl    $0, %eax
call    printf

The correcsponding assembly for printf(" zippo = %p\n", (void *) zippo); :

movl    $zippo, %esi
movl    $.LC1, %edi
movl    $0, %eax
call    printf

The correcsponding assembly for printf(" *zippo = %p\n", (void *) (*zippo) );

movl    $zippo, %esi
movl    $.LC2, %edi
movl    $0, %eax
call    printf

The correcsponding assembly for printf(" **zippo = %d\n", (int) ( **zippo) ); :

movl    $zippo, %eax
movl    (%rax), %eax
movl    %eax, %esi
movl    $.LC3, %edi
movl    $0, %eax
call    printf

The correcsponding assembly for printf(" * ((int **) zippo) = %p\n", (void *) (* ((int **) zippo) ));

movl    $zippo, %eax
movq    (%rax), %rax
movq    %rax, %rsi
movl    $.LC4, %edi
movl    $0, %eax
call    printf

As you can notice here, for the 3 first printf, the corresponding assembly is exactely the same (what changes is LCx which corresponds to format). Same thing for the last 2 printf.

My understanding is that as the compiler is aware that zippo is a 2 dimensional array, and therefore knows that *zippo is 1 dimensional array whose data starts at the adress of the first element.

Hedi
  • 47
  • 1
  • 5
  • The code invokes undefined behaviour. You read an `int` array as pointer to `int`. The fact that `%p` expects a `void *` and you have to cast is another cause of UB (the compiler should actually complain). "compiler knows that `*zippo` dosen't even exist in memory" is nonsense. Of course does `*zippo` exist. It is an array. – too honest for this site Dec 26 '16 at 09:54
  • @olaf The code do not invoke an undefined behavior. If changing the last line with `printf(" * ((int **) zippo) = %d\n", * ((int **) zippo) );` then the ouput is always 2 (and of course compiler is complaining about expecting int * not int). Another thing : I converted the C code above to assembly to understand what happens exactly and it seems like `*zippo` is treated as `zippo`. I'll be updating my answer but I would like to have your opinion on this first. Thx – Hedi Dec 26 '16 at 11:16
  • rq : 2 is the first value of the array – Hedi Dec 26 '16 at 11:18
  • No idea what you mean. Never use a cast if you don't really understand all implications. Wildly casting just to silence the compiler is a guarantee for disaster. – too honest for this site Dec 26 '16 at 11:26
  • `*(int **) zippo` yields a pointer! But `zippo[0][0]` is an integer, not a pointer. Your code **does** invoke UB and any modern compiler will complain with recommended warnings enabled. The rest of my previous comment also applies. – too honest for this site Dec 26 '16 at 11:29
  • @olaf no warnings from valgrind, no warning with -Wall (except for print pointer as int), using gcc 5.4. I don't see how it invokes UB. BTW I edited my answer so that what I'm saying becomes clearer – Hedi Dec 26 '16 at 14:00
  • @Hedi "warning: cast from 'int (*)[2]' to 'int **' increases required alignment from 4 to 8", you should enable more warning `-Wextra`. By the way, `%p` need a `void *` so you need to cast because `printf()` use variadic argument, so there is no auto promote. – Stargateur Dec 26 '16 at 14:24
  • @Stargateur Thx for the remark. I added necessary casts and compiled with -pedantic to ensure there is no warnings – Hedi Dec 26 '16 at 15:15
  • `-pedantic` does not include all recommended warnings, but it will warn errorneously if you use extensions intentionally. And digging down to assembly does not make sense for alanguage question. This is not even implementation-specific, but depends on the complete context. gcc is know for very agressive optimisations. And the code **does** invoke UB. Do not use casts on wrong code! – too honest for this site Dec 26 '16 at 15:26
  • Your code don't compile anymore because you remove the declaration of `zippo`. You don't fix the problem with the warning because you should not cast a `int (*)[]` to `int **`. You could write `printf(" *((int *)(*(int (*)[2])(zippo)))) = %d\n", *((int *)(*(int (*)[2])(zippo))));`. "Do not use casts on wrong code!" – Stargateur Dec 27 '16 at 00:11
  • re-added declaration of zippo. Thanks for your comments. – Hedi Dec 28 '16 at 08:22
-1

Although it looks like if the memory address of zippo is 30 and the value of zippo and *zippo is 15 is happening, it is not happening. Think in terms of data types. Take a super dimensional array.

int zappo[2][3][4][5][6] = {{{{{45,55,66,77,88,99},{12,22,32....

When you define a variable like this (on the stack, not using chained malloc) the compiler does not allocate one value zappo, another for *zappo and another for **zappo etc. It writes 45,55,66,77,88,99,12,22,32 in a contiguous block of memory, say at 0xfe. Now at compile time, it knows that

zappo is a pointer, and has a value 0xfe
*zappo is a pointer, and has a value 0xfe
**zappo is a pointer, and has a value 0xfe
***zappo is a pointer, and has a value 0xfe
****zappo is a pointer, and has a value 0xfe
*****zappo is a pointer, and has a value 0xfe, but this one points to an int!

The compiler thinks in terms of data type. So only the last dereference results in an int, the rest, to just one address. This is not the same as declaring

int *****zappo;

and painstakingly creating the array structure manually (in the heap, with some alloc). That's where you can use your box analogy.

Elan
  • 443
  • 5
  • 14
  • Ah this makes more sense. I am coming from Java and am used to arrays only being dynamically allocated. Thank you – BlaqICE Dec 25 '16 at 22:00
  • Welcome. It is a quirk of C, known as array decaying into pointer. Further details can be found in the book Deep C secrets, Chap 4, for example. But not necessary for an operating knowledge. – Elan Dec 25 '16 at 22:28
  • 3
    Don't use implementation details at this level. There is no requirement for a stack in the C language for **automatic** variables. `*zappo` is **not** a pointer, but an array! `**zappo` is **not** a pointer, but an array! ... And none of them has a value of `0xfe`. That's just their address. What is the value of an array? Arrays are not pointers (which you actually state with the second half)! – too honest for this site Dec 26 '16 at 04:07