7

I am trying to learn the structure of executable files of C program. My environment is GCC and 64bit Intel processor.

Consider the following C code a.cc.

#include <cstdlib>
#include <cstdio>

int x;

int main(){
  printf("%d\n", sizeof(x));
  return 10;
}

The size -o a shows

 text      data     bss     dec     hex filename
 1134       552       8    1694     69e a

After I added another initialized global variable y.

int y=10; 

The size a shows (where a is the name of the executable file from a.cc)

 text      data     bss     dec     hex filename
 1134       556      12    1702     6a6 a

As we know, the BSS section stores the size of uninitialized global variables and DATA stores initialized ones.

  1. Why int takes up 8 bytes in BSS? The sizeof(x) in my code shows that the int actually takes up 4 bytes.
  2. The int y=10 added 4 bytes to DATA which makes sense since int should take 4 bytes. But, why does it adds 4 bytes to BSS?

The difference between two size commands stays the same after deleting the two lines #include ....

Update: I think my understanding of BSS is wrong. It may not store the uninitialized global variables. As the Wikipedia says "The size that BSS will require at runtime is recorded in the object file, but BSS (unlike the data segment) doesn't take up any actual space in the object file." For example, even the one line C code int main(){} has bss 8.

Does the 8 or 16 of BSS comes from alignment?

Peng Zhang
  • 3,475
  • 4
  • 33
  • 41

1 Answers1

10

It doesn't, it takes up 4 bytes regardless of which segment it's in. You can use the nm tool (from the GNU binutils package) with the -S argument to get the names and sizes of all of the symbols in the object file. You're likely seeing secondary affects of the compiler including or not including certain other symbols for whatever reasons.

For example:

$ cat a1.c
int x;
$ cat a2.c
int x = 1;
$ gcc -c a1.c a2.c
$ nm -S a1.o a2.o

a1.o:
0000000000000004 0000000000000004 C x

a2.o:
0000000000000000 0000000000000004 D x

One object file has a 4-byte object named x in the uninitialized data segment (C), while the other object file has a 4-byte object named x in the initialized data segment (D).

Adam Rosenfield
  • 390,455
  • 97
  • 512
  • 589
  • +1 the linked startup code in the runtime library may well be accounting for the additional space being seen with `size` (and if I had to *guess* (always dangerous), i'd say it was something like `errno` or any of a handful of other commodities; mach-o images include a stub binder for the dynalib loader, for example). – WhozCraig Jul 18 '14 at 21:00
  • Thanks. I would like to know more about the structure of the executable file. Such as, what the BSS really does? – Peng Zhang Jul 18 '14 at 21:07
  • 1
    @PengZhang: the BSS segment is usually a segment that has an address and a size (so it can be mapped to memory) but contains no actual initialized data (so there is nothing to store into a file). After loading, it is available as readable and writable memory; just like common DATA. – Jongware Jul 18 '14 at 21:12
  • @Jongware Thanks. So while the DATA and TEXT will being copied into memory, the BSS will not? Does BSS serves as a notification to OS so that linux know how to allocate memories needed by the program (besides TEXT and DATA) when loading the program? – Peng Zhang Jul 18 '14 at 21:21
  • 1
    @PengZhang: that sounds about right. Find a good description of the executable format for your platform, and find a tool (or write one!) to dump the raw data. You will see that there is no file data associated with a BSS segment. – Jongware Jul 18 '14 at 21:24