2

I compiled below code with cygwin GCC on a x64 machine:

gcc main.c -o main

(main.c)

long long mango = 13; // I also tried `char`, `short`, `int`
long melon = 2001;

void main()
{

}

Then I dump the symbol values with nm:

./main:0000000100402010 D mango
./main:0000000100402018 D melon

As I understand, a symbol's value just means its address. So the address for mango is 100402010. And melon has address 100402018. So mango should occupy 8 bytes.

I tried other types for mango, such as char, int, short. It is always 8 bytes occupied.

Why the size doesn't change?

ADD 1

Thanks to the comment.

I just tried below code:

typedef struct{
    char a1;
    char a2;
    char a3;
    char a4;
    char a5;
} MyStruct;

MyStruct MyObj1={1,2,3,4,5};

MyStruct MyObj2={1,2,3,4,5};

long long mango = 13;
long melon = 2001;

void main()
{

}

And this time, nm shows me this:

./main:0000000100402020 D mango
./main:0000000100402028 D melon
./main:0000000100402010 D MyObj1
./main:0000000100402015 D MyObj2

MyObj1 and MyObj2 are 5 bytes separated. So it is indeed up to the compiler to decide the padding.

smwikipedia
  • 61,609
  • 92
  • 309
  • 482
  • 5
    The address is not telling you anything about the size. – Eugene Sh. Apr 26 '18 at 13:56
  • 1
    In main do: `printf("%zu %zu", sizeof(mango), sizeof(melon));` – Gillespie Apr 26 '18 at 13:57
  • @EugeneSh. If so, what does the symbol's value mean? Or how is the symbol's value *used*? – smwikipedia Apr 26 '18 at 13:58
  • 2
    @smwikipedia it's possible that that is the address. What we're saying is that the address tells you nothing about the size. It's possible that `melon` has to be on an 8 byte boundary so there's padding added between the variables. But the compiler can do whatever it wants here. – Kevin Apr 26 '18 at 13:59
  • 3
    Keep in mind that the compiler will align things on different boundaries and add padding as necessary. So there may be 1 byte size, 7 bytes padding in the case of a `char`. Use `sizeof` to determine sizes. – Gillespie Apr 26 '18 at 13:59
  • 1
    *"So the address for `mango` is `100402010`. And `melon` has address `100402018`. So `mango` should occupy 8 bytes."* -- No, this means that `mango` uses **no more than 8 bytes**. For performance reasons (imposed by the hardware), the values are stored in the memory aligned at the length of a processor word. For 64-bit processors, the memory address of any value is a multiple of 8 bytes (i.e. 64 bits). The compiler can be told to use a different alignment but you should have a good reason to do it. – axiac Apr 26 '18 at 14:20
  • 1
    Alignment rules are ABI specific and may required different alignment that expected simply by type size... See also https://www.codesynthesis.com/~boris/blog/2009/04/06/cxx-data-alignment-portability/ – dbrank0 Apr 26 '18 at 14:41

1 Answers1

2

From the GNU nm binary utilities: nm page:

The symbol value, in the radix selected by options (see below), or hexadecimal by default. The symbol type. At least the following types are used; others are, as well, depending on the object file format. If lowercase, the symbol is usually local; if uppercase, the symbol is global (external). There are however a few lowercase symbols that are shown for special global symbols (u, v and w). Depending on pragma settings and default alignment boundaries, the distance between successive symbol address may be the exact value of the number of bytes for that symbol type, or it may include padding, which increases the apparent sizeof the symbol.

A

    The symbol’s value is absolute, and will not be changed by further linking.
B
...

IMO the use of the word value in nm parlance is unfortunate, as in this context value is used to depict the symbol's address. The address of a symbol (value) will not change. But in normal C parlance, the value of a symbol does change, for example:

int i = 0; // the address for symbol i will remain constant
i = 10;    // but the value of the symbol i can change. 

Regarding the size of addresses, the address of any symbol for a 64bit build will always have a size of 8 bytes, while the address of any symbol on a 32bit build will have a size of 4 bytes. These sizes do not change, and are not affected by assigning a value to the symbol assigned to them.

Regarding the distance in memory space that occurs between various symbols, this distance is affected both by the type of symbol, how it is aligned along that implementations boundaries, and, as you have noted, compiler: "So it is indeed up to the compiler to decide the padding." Depending on pragma settings and default alignment boundaries, padding may cause the addresses for successive symbols to be a greater distance than that caused only by the combined sizeof values of the type, or types that define a particular symbol. (a very common occurrence for both char and struct type symbols).

ryyker
  • 22,849
  • 3
  • 43
  • 87