6

I've encountered a weird phenomenon in C which I need someone to explain. I have the code below with 2 single element arrays as global variables. I am printing the memory address of the first and second element of each array (note that the array was defined to have only 1 element):

#include <stdio.h>

int a1[1];
int a2[1];

int main(void) {

    a1[0] = 100;
    a2[0] = 200;

    printf("%d\n", &a1[0]);
    printf("%d\n", &a1[1]);
    printf("%d\n", &a2[0]);
    printf("%d\n", &a2[1]);
}

This gives the following output. Note that C allocated a contiguous memory block for array a2 right after a1 (hence address of a1[1] and a2[0] are same):

4223424
4223428
4223428
4223432

However, something miraculous occurs when I change the names of the arrays. I added "zzz" as prefix to both the arrays as below.

#include <stdio.h>

int zzza1[1];
int zzza2[1];

int main(void) {

    zzza1[0] = 100;
    zzza2[0] = 200;

    printf("%d\n", &zzza1[0]);
    printf("%d\n", &zzza1[1]);
    printf("%d\n", &zzza2[0]);
    printf("%d\n", &zzza2[1]);
}

After running this code you can see from the following output that the memory was allocated for the array zzza2 first and thereafter for zzza1 (&a2[1] = &a1[0]):

4223428
4223432
4223424
4223428

I have tested the above code with multiple array sizes (2,4,8) in various different machines at different times and got the same output so it is not a coincidence. This does not happen when I define the variables within main() as local variables.

It seems C is allocating memory based on the name we provide to the arrays. Firstly, why does C allocate contiguous blocks to different global arrays everytime? Secondly, when I add the prefix, why is the order of memory allocation changing?

Hope this doesn't baffle everyone as it has me... Thanks for your help in advance!

Kunal Kapoor
  • 445
  • 4
  • 13
  • 4
    The toolchain (the linker) is more or less free to place objects at whatever address it wants. Perhaps the name of the objects go into some sort of data structure that's based on a hash of the name and that influences the layout order? Who knows? See http://stackoverflow.com/questions/4575697/unexpected-output-from-bubblesort-program-with-msvc-vs-tcc/4577565#4577565 for another situation where variable name influenced the variable layout (by the compiler instead of the linker). – Michael Burr Sep 02 '15 at 03:40
  • I'm curious to know what the result is with a2[] declared before a1[]. If it is the same as your first test my guess is that the compiler references variable in some sort of hash table and produce the code reading it sequentially. And all compilers uses the same hash algorithm. – Joël Hecht Sep 02 '15 at 05:04
  • Anyway, you shouldn't make any assumptions about how variables are layed out in memory. – Jabberwocky Sep 02 '15 at 06:31
  • @JoëlHecht - Tried with declaring a2 before a1 and got the same results. Seems like you guys are right about the hash table implementation. Thanks for your help. – Kunal Kapoor Sep 02 '15 at 07:28
  • @MichaelBurr - Thank you for clarifying... Evidence points to the fact that you're right as indicated in the above comment. – Kunal Kapoor Sep 02 '15 at 07:29

1 Answers1

1

Firstly, why does C allocate contiguous blocks to different global arrays everytime?

Because one contiguous block is more effective and easier to implement. If application allocates global variables at different memory blocks it has at least one of next drawbacks:

  1. Wasted memory between blocks.
  2. Complicated memory allocation at start.

So toolchain tries to allocate all global variables at one contiguous memory block.