166

What I know is that global and static variables are stored in the .data segment, and uninitialized data are in the .bss segment. What I don't understand is why do we have dedicated segment for uninitialized variables? If an uninitialized variable has a value assigned at run time, does the variable exist still in the .bss segment only?

In the following program, a is in the .data segment, and b is in the .bss segment; is that correct? Kindly correct me if my understanding is wrong.

#include <stdio.h>
#include <stdlib.h>

int a[10] = { 1, 2, 3, 4, 5, 6, 7, 8, 9};
int b[20]; /* Uninitialized, so in the .bss and will not occupy space for 20 * sizeof (int) */

int main ()
{
   ;
}  

Also, consider following program,

#include <stdio.h>
#include <stdlib.h>
int var[10];  /* Uninitialized so in .bss */
int main ()
{
   var[0] = 20  /* **Initialized, where this 'var' will be ?** */
}

  
Guy Avraham
  • 3,482
  • 3
  • 38
  • 50
Whoami
  • 13,930
  • 19
  • 84
  • 140

6 Answers6

115

The reason is to reduce program size. Imagine that your C program runs on an embedded system, where the code and all constants are saved in true ROM (flash memory). In such systems, an initial "copy-down" must be executed to set all static storage duration objects, before main() is called. It will typically go like this pseudo:

for(i=0; i<all_explicitly_initialized_objects; i++)
{
  .data[i] = init_value[i];
}

memset(.bss, 
       0, 
       all_implicitly_initialized_objects);

Where .data and .bss are stored in RAM, but init_value is stored in ROM. If it had been one segment, then the ROM had to be filled up with a lot of zeroes, increasing ROM size significantly.

RAM-based executables work similarly, though of course they have no true ROM.

Also, memset is likely some very efficient inline assembler, meaning that the startup copy-down can be executed faster.

Guy Avraham
  • 3,482
  • 3
  • 38
  • 50
Lundin
  • 195,001
  • 40
  • 254
  • 396
  • 12
    To clarify: the only difference between .data and .bss is that on start-up, the "copy-down" can be run sequentially, hence faster. If it were not split into the two segments then the initialisation would have to skip the RAM spots belonging to the uninitialised variables, so wasting time. – Jodes May 15 '13 at 16:29
  • Thank you for your explaination about the startup process, but what happens when a variable in `.bss` becomes initialized ? Does it overwrite the `0` and stays in `.bss` ? Is it removed from .bss and written in `.data` (thus shortening the `.bss` segment) ? – Axel B Apr 16 '21 at 11:15
100

The .bss segment is an optimization. The entire .bss segment is described by a single number, probably 4 bytes or 8 bytes, that gives its size in the running process, whereas the .data section is as big as the sum of sizes of the initialized variables. Thus, the .bss makes the executables smaller and quicker to load. Otherwise, the variables could be in the .data segment with explicit initialization to zeroes; the program would be hard-pressed to tell the difference. (In detail, the address of the objects in .bss would probably be different from the address if it was in the .data segment.)

In the first program, a would be in the .data segment and b would be in the .bss segment of the executable. Once the program is loaded, the distinction becomes immaterial. At run time, b occupies 20 * sizeof(int) bytes.

In the second program, var is allocated space and the assignment in main() modifies that space. It so happens that the space for var was described in the .bss segment rather than the .data segment, but that doesn't affect the way the program behaves when running.

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • 19
    For example, consider having many uninitialized buffers 4096 bytes in length. Would you want all of those 4k buffers to contribute to the size of the binary? That would be a lot of wasted space. – Jeff Mercado Mar 02 '12 at 15:02
  • 2
    @jonathen killer : Why is entire bss segment described by single number ?? – Suraj Jain Aug 16 '16 at 16:46
  • @JonathanLeffler I mean all zero initialised static variable goes in bss . So shouldn't be its value be just zero ?and also why are they not given space on .data section how can doing so make it slow ? – Suraj Jain Aug 16 '16 at 16:58
  • @JonathanLeffler Please See This Question " http://stackoverflow.com/questions/8385322/difference-between-static-memory-allocation-and-dynamic-memory-allocation" . I think the selected best answer is wrong in explaining what is static memory allocation. Can You Confirm ? . – Suraj Jain Aug 16 '16 at 16:59
  • 3
    @SurajJain: the number stored is the number of bytes to be filled with zeros. Unless there are no such uninitialized variables, the length of the bss section won't be zero, even though all the bytes I the bss section will be zero once the program is loaded. – Jonathan Leffler Aug 16 '16 at 17:01
  • @JonathanLeffler Oh ok. I understand it . And Also why is using .data for this is slow.? And Please see the question i have written about above. As it will mislead many newcomers. Selected best answer is not correct. – Suraj Jain Aug 16 '16 at 17:03
  • @SurajJain: I've gone and edited that accepted answer because it was at best misleading. There's room to argue that my edit is too substantive. Note that the most up-voted answer was already better. Follow the voting, not just the accepted vs non-accepted status. – Jonathan Leffler Aug 16 '16 at 17:19
  • @JonathanLeffler Thanks A lot. I just saw your edit. You are right that answer was so misleading. It misleaded me. And i then read the most upvoted answer and it cleared my doubts. Strange that wrong answer become highlight of all the answers and get selected too as best answer. – Suraj Jain Aug 16 '16 at 17:22
  • @SurajJain: The bytes in the .data section have to be read from 'disk' (spinning magnetic platters, or solid-state). It takes time to read the data. By contrast, reading 8 bytes for the size of the bss segment is trivial (compared with reading 64 MiB of zeroes on disk, say) and then setting the memory to zero. Plus it wastes disk space — why store 64 MiB of zeroes when you could store just 8 bytes saying "there are 64 MiB of zeros". Plus it takes space on backups — remember those? – Jonathan Leffler Aug 16 '16 at 17:25
  • @JonathanLeffler Many People Asked The Person Who wrote the answer to edit it .But he did not .I thought i might edit it . But then i did not believed in myself that answer is really wrong. – Suraj Jain Aug 16 '16 at 17:26
  • At this stage, it's probably better that I do the edit than you. Your edit would be reviewed and might not make it through ('changing the author's meaning'). For worse or worse, I have more rep, especially w.r.t the C tag, so I can maybe get away with it without causing too many ructions. But the answerer could decide my edit is not valid and roll the change back; I'd not gainsay him (but I would go add a downvote that I've not done yet). – Jonathan Leffler Aug 16 '16 at 17:29
  • @JonathanLeffler I Understand .I just wanted to see the best answer to be at least correct and not mislead people specially new comers . Also i want to know you said in bss it is stored how many bytes would be filled with zero. I want ask when program is run then for example if there are 2 static uninitialised or initialised to zero variable. The bss section holds the number of bytes to be written to 0. When the program is run where are the actual 0 bytes written in ,data segment or where because bss section only tells how many bytes to be zeroed it does not itself stores `static int i = 0`. – Suraj Jain Aug 16 '16 at 18:23
  • @JonathanLeffler Also If is write `static int i[100] = {1,2,3....}` .Then would my executable be 4*100 bytes bigger. ?? Or does in `.data `section it only tells where in memory the static array would be stored like from address `0X00000000` to `0x10232333` something like that. You said `.data` section is as big as the sum of sizes of the initialized variables. – Suraj Jain Aug 16 '16 at 18:42
  • 1
    The .bss section in the executable is simply a number. The .bss section in the in-memory process image is normally memory adjacent to the .data section and often the runtime .data section is combined with the .bss; there is no distinction made in the runtime memory. Sometimes, you can find where the bss started (`edata`). In practical terms, the .bss doesn't exist in memory once the process image is completed; the zeroed data is simple part of the .data section. But the details vary depending on the o/s etc. – Jonathan Leffler Aug 16 '16 at 18:42
  • The partially initialized array `i` would have to be stored in the data section because it must be self-contiguous, and there's no way to ensure that there isn't other initialized data before and after it, so the only way to make sure the correct values are stored is to keep the 100 `int` (400 bytes) worth of data in the .data section of the executable. If it were the last array in .data, then in theory you could have 3 `int` initializers in .data and 97 zeros in .bss. I can confidently predict that doesn't happen in real life. – Jonathan Leffler Aug 16 '16 at 18:43
  • So that means at run time all the static variables that are explicitly or implicitly initialised to zero will be store in `.data` section . Or in some section and there will be no `.bss` Sorry i am asking so many question i just want to get the things right in my head. Also Could You suggest some resources like book or thing for studying this. Pointers on c book how is it ? – Suraj Jain Aug 16 '16 at 18:45
  • @JonathanLeffler Here "http://stackoverflow.com/questions/21350478/what-does-memory-allocated-at-compile-time-really-mean" is written that in .data section only it is written the memory address where the static initialized variables will go and the value itself are not stored. – Suraj Jain Aug 16 '16 at 18:48
  • @JonathanLeffler What i meant when i wrote `static int i[100] = {1,2,3,...}` is that all places are initialized to 1 , 2 ,3 and so on. – Suraj Jain Aug 16 '16 at 18:52
  • @JonathanLeffler Okay i understand . Thanks A lot for keeping with me so much. Also for correcting that answer it will help many. – Suraj Jain Aug 16 '16 at 19:01
25

From Assembly Language Step-by-Step: Programming with Linux by Jeff Duntemann, regarding the .data section:

The .data section contains data definitions of initialized data items. Initialized data is data that has a value before the program begins running. These values are part of the executable file. They are loaded into memory when the executable file is loaded into memory for execution.

The important thing to remember about the .data section is that the more initialized data items you define, the larger the executable file will be, and the longer it will take to load it from disk into memory when you run it.

and the .bss section:

Not all data items need to have values before the program begins running. When you’re reading data from a disk file, for example, you need to have a place for the data to go after it comes in from disk. Data buffers like that are defined in the .bss section of your program. You set aside some number of bytes for a buffer and give the buffer a name, but you don’t say what values are to be present in the buffer.

There’s a crucial difference between data items defined in the .data section and data items defined in the .bss section: data items in the .data section add to the size of your executable file. Data items in the .bss section do not. A buffer that takes up 16,000 bytes (or more, sometimes much more) can be defined in .bss and add almost nothing (about 50 bytes for the description) to the executable file size.

mihai
  • 4,592
  • 3
  • 29
  • 42
11

Well, first of all, those variables in your example aren't uninitialized; C specifies that static variables not otherwise initialized are initialized to 0.

So the reason for .bss is to have smaller executables, saving space and allowing faster loading of the program, as the loader can just allocate a bunch of zeroes instead of having to copy the data from disk.

When running the program, the program loader will load .data and .bss into memory. Writes into objects residing in .data or .bss thus only go to memory, they are not flushed to the binary on disk at any point.

Guy Avraham
  • 3,482
  • 3
  • 38
  • 50
janneb
  • 36,249
  • 2
  • 81
  • 97
5

The System V ABI 4.1 (1997) (AKA ELF specification) also contains the answer:

.bss This section holds uninitialized data that contribute to the program’s memory image. By definition, the system initializes the data with zeros when the program begins to run. The section occupies no file space, as indicated by the section type, SHT_NOBITS.

says that the section name .bss is reserved and has special effects, in particular it occupies no file space, thus the advantage over .data.

The downside is of course that all bytes must be set to 0 when the OS puts them on memory, which is more restrictive, but a common use case, and works fine for uninitialized variables.

The SHT_NOBITS section type documentation repeats that affirmation:

sh_size This member gives the section’s size in bytes. Unless the section type is SHT_NOBITS, the section occupies sh_size bytes in the file. A section of type SHT_NOBITS may have a non-zero size, but it occupies no space in the file.

The C standard says nothing about sections, but we can easily verify where the variable is stored in Linux with objdump and readelf, and conclude that uninitialized globals are in fact stored in the .bss. See for example this answer: What happens to a declared, uninitialized variable in C?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
4

The wikipedia article .bss provides a nice historical explanation, given that the term is from the mid-1950's (yippee my birthday;-).

Back in the day, every bit was precious, so any method for signalling reserved empty space, was useful. This (.bss) is the one that has stuck.

.data sections are for space that is not empty, rather it will have (your) defined values entered into it.

Philip Oakley
  • 13,333
  • 9
  • 48
  • 71