1

I was reading this link How do I use arrays in C++?, section 5. Common pitfalls when using arrays, where an example is given as following:

// [numbers.cpp]
int numbers[42] = {1, 2, 3, 4, 5, 6, 7, 8, 9};

// [main.cpp]
extern int* numbers;
int main()
{}

Since 'numbers' in numbers.cpp is the name of array, which can normally decay to a pointer equals to '&numbers[0]', I would expect in main.cpp value of 'numbers' is still '&numbers[0]'. But no! it is instead 'numbers[0]', i.e. '1'.

Or let's say I am the compiler, in 'numbers.cpp', I see the symbol 'numbers' as an address pointing to '1', why is this same symbol changed to value 1' in 'main.cpp'?

I understand that's what author says "type-unsafe linking". But I do not know why compiler does this, even if compiler just raise a type-mismatch link error make more sense to me.

Comments

I guess my understanding is, compiler see below as equivalent, so that linker succeed, otherwise will have 'unresolved externals' error:

// [numbers.cpp]
int tmp[42] = {1, 2, 3, 4, 5, 6, 7, 8, 9}; //{1,..9} starts at global address 0x1234
int *numbers = &tmp[0];                    //numbers == 0x1234

// [main.cpp]
extern int* numbers;                       //numbers == 0x1234
int main()
{}

The real situation:

// [numbers.cpp]
int numbers[42] = {1, 2, 3, 4, 5, 6, 7, 8, 9}; //{1,..9} starts at global address 0x1234

// [main.cpp]
extern int* numbers;                       //numbers == numbers[0] == 1
int main()
{}
halfer
  • 19,824
  • 17
  • 99
  • 186
user1559625
  • 2,583
  • 5
  • 37
  • 75
  • 1
    Because `int numbers[42]` isn't a pointer. – Seth Carnegie Feb 22 '13 at 03:51
  • @Seth Carnegie. Yes, but it seems compiler did perform cast from int array to int pointer. Otherwise linker will raise 'unresolved external' error. And by no means can i understand why it's giving 'numbers' in main.cpp a value of '1'. – user1559625 Feb 22 '13 at 04:05
  • @user1559625, numbers[42] will degrade to a pointer, thus *numbers points to an int with value of 1. – Josh Petitt Feb 22 '13 at 04:06
  • 2
    Linker does not check type information; it finds the symbol "numbers" that points to a block of memory, no idea what it is supposed to be. – Michael Day Feb 22 '13 at 04:10
  • 2
    @user1559625 no, it _didn't_ perform a cast. If you realise that `numbers[42]` is a solid block of memory, the compiler just overlays the pointer `numbers` on it, since they supposedly refer to the same block of memory. So the pointer `numbers` is at the same location as `numbers[42]`. – Seth Carnegie Feb 22 '13 at 04:10
  • @Seth Carnegie numbers[42] is an array of const that reside on global memory, right? Let's say the address starts at 0x1234, then numbers in main.cpp should point to 0x1234, but now the numbers value in main.cpp is 1. – user1559625 Feb 22 '13 at 04:24
  • `numbers` won't *point* to 0x1234, the value that it points to will be stored at 0x1234. The pointer value itself has to exist somewhere. (eg. if it points to 0x1234, where is this value stored?) – Michael Day Feb 22 '13 at 04:27
  • @user1559625 in your example, the pointer won't _point_ to `0x1234` (i.e. the value of the pointer is 0x1234), the _address_ of the pointer will be `0x1234` (i.e. whatever is at 0x1234 is what the pointer points at, in this case, 1). – Seth Carnegie Feb 22 '13 at 04:33
  • @user1559625, `numbers` is an array of 42 `int`s, the first 9 of which are initialized with the given numbers, and the remaining 33 are set to zero. This initialization takes place before your program starts. The elements of `numbers` aren't constants. – vonbrand Feb 23 '13 at 00:17

3 Answers3

2

Does the following way of explaining it help:

When a is an array (i.e. when the compiler knows that the type of a is an array type), then the syntax a[i] is interpreted as: Return the ith element of the array.

On the other hand, when a is a pointer (to the first element of an array), then the same syntax, a[i], is interpreted as: Look up the address stored in a, add the number of bytes that corresponds to i elements, and return the value stored there.

In main.cpp, it thinks that numbers is a pointer(*) and applies the corresponding actions. That is, it looks up the value stored in numbers, treats that as an address, adds an amount of bytes and returns the value stored at that address.

(*)It does this because numbers is declared as a pointer there. The compiler doesn't know that it really is an array, because main.cpp is compiled separately from numbers.cpp (i.e. it is a separate translation unit). So it doesn't decay the array into a pointer – it simply assumes it is a pointer already.

jogojapan
  • 68,383
  • 11
  • 101
  • 131
  • Still sth do not understand, sorry about it. Firstly, to make linker succeed, 'numbers' has be the same type in main.cpp & numbers.cpp, right? (define 'numbers' as say pure int, but not int array in numbers.cpp will fail linker). And in separate compilation, since C does not have 'array' type, 'numbers' in numbers.cpp is still interpreted as int*? In your comments, 'In main.cpp, it thinks that numbers is a pointer(*) and applies the corresponding actions. That is, it looks up the value stored in numbers, treats that as an address,' why is the numbers[0] considered as value stored in numbers? – user1559625 Feb 22 '13 at 05:40
  • @user1559625 If the linker can check the type, it will probably do that. For functions that may work because they are always name-mangled (i.e. the symbol names produced by the compiler and stored in the object file include type information). But for variables this isn't necessarily the case. The code my GCC generates on Linux does not have name mangling for global variables. And what do you mean when you say C doesn't have an array type? It sure does. `int[]` is an array, `int*` is a pointer. That's two different types, in C and C++ alike. – jogojapan Feb 22 '13 at 05:54
  • @user1559625 It looks up the value stored for the variable `numbers`, thinking that it is a pointer. The compiler knows the memory address each variable refers to (usually as an offset from the stack frame pointer). So it generates code that goes to that address and, thinking it is dealing with a pointer, extracts as many bytes from that address as correspond to a pointer value, i.e. an address. On many systems that will be the same as an `int`, so it misinterprets the first `int` stored in the array as an address. (On systems where `sizeof(int) != sizeof(int*)` it will be different.) – jogojapan Feb 22 '13 at 05:58
1

If numbers is an array, eg. numbers[], then you cannot change what it points to. The object file will map the symbol "numbers" to the actual array, {1, 2, ...} But if numbers is a pointer, eg. *numbers, then you can change what it points to, and the object file will map the symbol "numbers" to a single pointer value (that may itself point to the beginning of an array, but we don't know that).

Arrays and pointers act similar, but are not the same thing.

Michael Day
  • 1,007
  • 8
  • 13
  • Thanks, i know this too. But why is 'number-as-pointer' has a value of 'number-as-array[0]' in this case. Is this Undefined Behavior in C++ which happen to have this result? – user1559625 Feb 22 '13 at 04:12
  • Why? Because the compiler believes you when you say the symbol "numbers" refers to a pointer. Actually, the symbol refers to an array whose first element is 1, so when you access it as a pointer, that's what you get. If you look at the generated assembly language, it will make perfect sense. Either way, don't lie to the compiler about the type of a symbol in another module, as it can't check and will fail miserably. – Michael Day Feb 22 '13 at 04:15
  • 'number-as-pointer' has a value of 'number-as-array[0]', are you absolutely sure you don't mean '*number-as-pointer' has a value of 'number-as-array[0]' – Josh Petitt Feb 22 '13 at 04:15
  • @JoshPetitt, @user1559625 has defined `numbers` differently in two different modules, hence the confusion. – Michael Day Feb 22 '13 at 04:18
  • @Michael Day Still need some help here. First of all, there's no type array in C, if 'numbers' is mistakenly defined as any types other than int array in numbers.cpp, link will fail, that's why i thought that compiler can recognize 'numbers' in numbers.cpp as a decayed int*. secondly let's say compiler try to get value for 'numbers' pointer, why is it grabbing 'numbers[0]'? – user1559625 Feb 22 '13 at 05:31
  • Linking does not check types, and I think you need to learn more about C compilation and linking from other sources, this is not the right forum for such detailed discussion. – Michael Day Feb 22 '13 at 05:52
  • @Michael Day Thanks, but i think linking checks type. You can try define numbers in numbers.cpp as 'int numbers;' or 'double numbers[42];'. Your link will fail. Array not a type in C, linker does not complain in our case 'cause we define it as int array. – user1559625 Feb 22 '13 at 06:12
  • Link doesn't fail for me using gcc, which C++ compiler are you using? – Michael Day Feb 22 '13 at 06:39
  • @Michael Day i am using VStudio. Maybe that's causing the problem. Thanks. – user1559625 Feb 22 '13 at 16:01
-3

In C++ the typical answer is "use a container". For array-like containers, std::vector is usually what you want. If you want maximum performance, std::valarray may be a better option (of course the performance is if you use it like its meant to be used, such as with gslice).

A better article would be "When would I use C style arrays with C++". Answer: almost never, unless you have to interface to other software.

Josh Petitt
  • 9,371
  • 12
  • 56
  • 104
  • 1
    Answer to "why does compiler/linker to this" is not "use std::vector", although that may still be a helpful comment :) – Michael Day Feb 22 '13 at 04:07
  • @MichaelDay, title was "common pitfalls..." Agreed that my answer does not include the "why", but short-circuits to "how to avoid the common pitfalls". Oh well, live and learn I guess... – Josh Petitt Feb 22 '13 at 04:10
  • 1
    "Common pitfalls when using arrays in C++" first pitfall is using C++ at all, second pitfall is using arrays; sadly both very common :) – Michael Day Feb 22 '13 at 04:12