4

If I compile the following program int array[5000]={0}; int main(){}, the output file size is much smaller than if I do int array[5000]={1}; int main(){}, which initializes the first element with a one and the rest with zeros, so why is there such a big difference on the file size?

  • If I am right, there is hardware support for zero-initialized sections. And presumably, making the first element different is enough to bypass this mechanism and store the array explicitly in the executable. You could check if the difference in size is close to 20000 bytes. –  Sep 02 '21 at 17:44
  • 9
    Because zero-initialized globals go into `.bss` section, which is zeroed-out with a simple loop in the startup code/runtime library (or simply mapped to a zero page - depending on the environment). Non-zero initialized globals go to `.data` section, which include an explicit initializer for each, and taking up the space in the binary. – Eugene Sh. Sep 02 '21 at 17:46
  • 1
    One way to think about it is that by default the entire object must be stored in the executable, but that there is a special optimization for objects which are *entirely* zeros. You could imagine there being a similar optimization for objects which are *mostly* zeros, but I don't think anyone has bothered to invent it, since it's a far less common case. – Nate Eldredge Sep 02 '21 at 18:31
  • So if you need such an object, it might be better to just declare it as `int array[5000] = {0};` and then execute `array[0] = 1;` sometime before you actually use the array. Likewise, any time you have a large array that needs to be initialized to some pattern which is easily predictable but not all zeros, it may be better to do it at runtime with `memset` or a simple loop or whatever. – Nate Eldredge Sep 02 '21 at 18:37
  • Rather: because you didn't enable optimizations so the compiler didn't remove the useless variable like it should. – Lundin Sep 07 '21 at 12:00

4 Answers4

3

Your array is a static global variable.

If it is declared as initialized with zeros only, it can be allocated in a special segment of memory, which is created during the process startup and initialized with zeros.

OTOH if it is declared as containing anythig non-zero, its initial value must be stored inside the program's file, so that when the operating system prepares the program in memory for being run, it can allocate appropriate segment of data and fill it with defined initial values.

See https://en.wikipedia.org/wiki/Data_segment for DATA and BSS segments.

CiaPan
  • 9,381
  • 2
  • 21
  • 35
0

When you don't initialize a global (or static) variable, it get's allocated in an output segment that is called .bss which is all zeros and so, it doesn't need the details to be written in the output file. If you put a single bit different than zero, the variable has to go into the initialized data segment (.data) which is written to the output file, as its contents must be detailed. This means that, even if you explicitly initialize it to zeros, the compiler realizes that the initialization coincides with the one of an uninitialized variable and stores the array in the .bss segment too, avoiding the grow in the final file.

For the .data segment, all of its contents is saved on the executable file, while for the .bss segment, only its size is stored, as the kernel can allocate a zero filled segment for it when it it loaded into memory.

In unix systems, the data segment initialization is made by checking the full size of the data segments (.data plus .bss) but only the .data segment is copied to the segment at loading time. The rest is allways filled by the kernel with zeros, by default. This accelerates the process of loading the code into memory for the kernel and makes the executable smaller.

Luis Colorado
  • 10,974
  • 1
  • 16
  • 31
0

so why is there such a big difference on the file size?

Essentially, it's because the compiler/linker/executable loader aren't good at optimizing.

If a statically allocated array is full of zeros (or uninitialized) the compiler puts it in a special section (".bss") with everything else that's zeros (or uninitialized); and because the program loader knows the entire section is full of zeros none of the data is stored in the file itself.

If a statically allocated array isn't full of zeros; then the compiler puts it in a different section (".data") and all of the data gets included in the file (even when it's "almost but not quite full of zeros").

Ideally; the compiler/tools would be able to detect simple cases (e.g. an array that is initialized with one non-zero value that is almost but not quite full of zeros) and put the array in the ".bss" so it costs nothing, but then generate a small amount of start-up code to correct it (e.g. set the first element in the array) before any of your code executes.

As a work-around, (if the array isn't read-only) you could do the same optimization yourself (leave the array full of zeros, and put an array[0] = 1; at the start of your main()).

Brendan
  • 35,656
  • 2
  • 39
  • 66
0

From .bss [BSS in C]

An implementation may also assign statically-allocated variables and constants initialized with a value consisting solely of zero-valued bits to the BSS section.

The size that BSS will require at runtime is recorded in the object file, but BSS (unlike the data segment) doesn't take up any actual space in the object file.

For program int array[5000]={0}; int main(){}

data and bss size:

# size a.out 
   text    data     bss     dec     hex filename
   1040     484   20032   21556    5434 a.out

executable size:

# ls -l a.out
-rwxr-xr-x. 1 root root 6338 Sep  7 17:05 a.out

For program int array[5000]={1}; int main(){}

data and bss size:

# size a.out
   text    data     bss     dec     hex filename
   1040   20512      16   21568    5440 a.out

executable size:

# ls -l a.out
-rwxr-xr-x. 1 root root 26362 Sep  7 17:24 a.out

The output shown above is from Linux platform.

H.S.
  • 11,654
  • 2
  • 15
  • 32