4

I know this topic has been addressed a few times but I still don't get it.

Referring to http://c-faq.com/aryptr/aryptr2.html

char a[] = "hello";
char *p = "world";

creates an array "a". The variable is only a label of the address of the first memory address. p is a variable (== label of another memory address), containing the address of the first character (w - which may be somwhere else in the memory).

The thing I don't understand is the example from this topic (thanks!): using extern to link array with pointer :

extern1.c

extern int *array;
int test();

int main(int argc, char *argv[])
{
    printf ("in main: array address = %x\n", array);
    test();
    return 0;
}

extern2.c

int array[10] = {1, 2, 3};

int test()
{
    printf ("in test: array address = %x\n", array);
    return 0;
}

Why does the pointer variable "extern int *array" (== a label of memory address, containing an other address) in extern1 already contain the content of array[0]?

What is the difference to

int array[10] = {1, 2, 3};
int *ptrArrWorking = array; // == &array[0];

? In the extern case it would be something like

int array[10] = {1, 2, 3};
int *ptrArrNotWorking = array[0]; 

edit: to be more explicit let's interpret the example above as

int array[10] = {1, 2, 3};
int *ptrArrNotWorking = (int *)array[0]; 

so this simulates the same behaviour which can be seen in the extern example.

Where is the indirection hidden?

Thank you so much.

Community
  • 1
  • 1
Jan
  • 310
  • 1
  • 7

3 Answers3

5

You have defined, in extern2.c, an object named array:

int array[10] = {1, 2, 3};

This is a contiguous segment of memory containing ten ints. Upon passing it to printf in test, it decays into a pointer to its first element -- that's how arrays are passed to functions. Thus, printf prints the address of the first int.

In extern1.c, however, you lied when declaring array again:

extern int *array;

This pretends that array is a pointer, a single object which holds the address of something else. This mismatch renders the program "ill-formed, no diagnostic required". That was standardese for "badly broken" -- from then on there is no requirement on whether the program compiles, or what it actually does at runtime.

In practice, the code following that broken declaration will indeed treat array as a pointer. So when you pass array to printf, it will chop off a handful of bytes from the beginning of the array (typically 4 or 8 depending on your platform), yell "there you go, here's the pointer" and give that to printf.

Of course, it's not actually a valid pointer, but bits of the first few ints from array stuffed together, so what you see is nonsense.

Quentin
  • 62,093
  • 7
  • 131
  • 191
  • This is correct explanation of what is going on. And it will get more clear if you change type of `array` to `int8_t`. – LennyB Apr 25 '17 at 12:48
  • Great, thank you. So the program's behaviour is undefinied and broken. In sum all the answers make sense, I'll accept this one because it's the clearest for me atm. Thank you all very much! – Jan Apr 25 '17 at 13:30
  • @LennyB, I changed the type of the array to another datatype and saw what you mean. The pointer simply grabs the first 4 Bytes of the array. The main point is the (correct) decaying of the array vs the non decaying (because it's already pointer type) of the pointer. That's caused by lying to the compiler, is that correct? – Jan Apr 25 '17 at 14:15
  • @Jan yes, the code's behaviour depends on the type it "believes" `array` is. No need for pointers or arrays either: you could just as well trigger the same bug by declaring an `int` on one side and a `double` on the other, and get half a garbled `double`. – Quentin Apr 25 '17 at 14:47
0

Using extern only declares a variable, but not defines it.

Effectively this means the array variable (declared as extern in extern1.c) is only an alias of the actual variable array defined in extern2.c

Community
  • 1
  • 1
Dmitry Egorov
  • 9,542
  • 3
  • 22
  • 40
  • This might be the key point here. How can this be interpreted? I still don't understand, where the additional indirection comes from. Does it have technical reason (on assembly/architecture level) or a convention by the compiler/the c standard? Is it valid at all by the way or is it implementation defined or even undefinied behaviour? – Jan Apr 25 '17 at 12:35
0

creates an array "a". The variable is only a label of the address of the first memory address.

No, a is the whole array hello\0. Whenever the identifier a is used inside an expression, the array "decays" into a pointer to its first element. That is, inside the expression where a is used you will get a temporary char* pointing at the letter 'h'.

Think of it like this: the array is the road to Rome. You ask the compiler "where is the road to Rome, I need to access it" and then it helpfully puts up a temporary road sign for you. This doesn't mean that the road to Rome becomes a road sign. Nor does it mean that the road sign itself is the road to Rome. Similarly, the road to Rome does not become a road sign just because you are driving on it (accessing it).

Why does the pointer variable "extern int *array" (== a label of memory address, containing an other address) in extern1 already contain the content of array[0]?

It does not. You cannot write extern int* array and expect to get a pointer to the first item in an array int array[10] allocated elsewhere. Because arrays are not pointers and pointers are not arrays.

What happens in the example with disassembly is that the code invokes undefined behavior (bugs), where the programmer lies to the compiler and says "what's stored here is actually a pointer, trust me" even though there is no pointer there, but an integer.

So as it happened on that specific system, you get the address of the pointer set to 1. Not the pointed-at data. This is nothing you can rely upon; the code could as well have crashed, if pointers and integers would have had different sizes or representations on the given system.

To use the analogy about the road, your compiler is the driver and the program counter of the CPU is the car. With extern int *array you tell the compiler "that thing over there a road sign!", referring to the actual road. The compiler then blindly does what you told it and tries to interpret some direction out of the asphalt, then have the car follow it, after which it runs off into the wilderness, likely crashing and most certainly never reaching Rome.

Here:

int array[10] = {1, 2, 3};
int *ptrArrWorking = array; // == &array[0];

you set the pointer to point at the first item in the array. Which means that the pointer would get the memory address of where that item is stored. You don't set the address itself to 1, which would most likely be nonsense on most systems.

In the extern case it would be something like

int array[10] = {1, 2, 3};
int *ptrArrNotWorking = array[0]; 

Yes exactly, which is code that doesn't make any sense. It is in fact not even valid C, because you can't assign integers to pointers without performing a conversion first - something that can be done with a type cast.

Lundin
  • 195,001
  • 40
  • 254
  • 396
  • [Here](http://eli.thegreenplace.net/2009/10/21/are-pointers-and-arrays-equivalent-in-c) is mentioned that a is only a label for the first element's memory address. Is that incorrect? I understand something bad happens if I lie to the compiler. Just wanted to know what really happens and if that's a undefined/implementation defined problem. I added the explicit cast to int *ptrArrNotWorking = array[0]; in the main question. – Jan Apr 25 '17 at 13:19
  • @Jan The data type "array of type" in C refers to the whole chunk of memory and not just the first address. Naturally the address of the array itself will be the same as the address of the first element. That tutorial seems mostly correct even though it tries to simplify the array concept a bit. The array type actually also contains internal information to the compiler about how large the array is. And it is a distinct type. – Lundin Apr 25 '17 at 13:42
  • Is it true, that an array also decays to a pointer to the first element's address if it is being used in an expression? I know it's true when passing it as a function parameter. – Jan Apr 25 '17 at 13:59
  • @Jan I believe that's answered in the first sentence of my answer. – Lundin Apr 25 '17 at 14:05
  • Yes, true. Just wanted to make sure it's correct - because I thought it was true for parameter passing only. I'd like to upvote you, but it's not allowed :( Thank you! – Jan Apr 25 '17 at 14:11