6

AFAIK, there're 2 types of global variables, initialized and unintialized. How are they stored? Are they both stored in the executable file? I can think of initialized global variables having their initial values stored in executable file. But what needs to be stored for the uninitialized ones?

My current understanding is like this:

Executable file is organized as several sections, such as .text, .data, and .bss. Code is stored in .text section, initialized global or static data is stored in .data section, and uninitialized global or static data is stored in .bss section.

Thanks for your time to view my questions.

Update 1 - 9:56 AM 11/3/2010

I found a good reference here:

Segments in Assembly Language Source - Building the text and data segments with .text, .data, and .bss directives

Update 2 - 10:09 AM 11/3/2010

@Michael

  1. I define a 100 bytes of un-initialized data area in my assembly code, this 100-bytes is not stored in my executable file because it is NOT initialized.

  2. Who will allocate the 100-byte uninitialized memory space in RAM? The program loader?

Suppose I got the following code:

int global[100];

void main(void)
{
   //...
}

The global[100] is not initialzed. How will the global[100] be recoded in my executable file? And who will allocate it at what time? What if it is initialized?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
smwikipedia
  • 61,609
  • 92
  • 309
  • 482

4 Answers4

11

Initialized variable values are stored in the .data segment of the executable. Uninitialized ones don't have to be stored. They end up in the .bss segment in RAM, but the size of the segment is zero in the executable file, just the required amount of memory is stored in the segment descriptor. The code in the .text section is accessing these via offsets into the segment. Runtime linker-loader patches these references to actual virtual addresses. See, for example, the Executable and Linkable Format, which is used on most Unix-like operating systems.

Nikolai Fetissov
  • 82,306
  • 11
  • 110
  • 171
  • very platform specific answer without mentioning that fact – Andrey Nov 03 '10 at 01:57
  • and memory is flat, not segmented – Andrey Nov 03 '10 at 01:58
  • There's always a doubt haunting me: If nothing is stored for the un-initialized data in my executable file, how could the operating-system know what to allocate in RAM at runtime? – smwikipedia Nov 03 '10 at 01:58
  • 2
    @Andrey, yes, the virtual address space is virtually flat :) Then there are MMUs, pages, TLBs, caches, NUMA, and all that fun. The process VA space is still partitioned. That's why you get *segmentation violations* :) – Nikolai Fetissov Nov 03 '10 at 02:02
  • 1
    @smwikipedia, read the ELF spec (or COFF, or whatever) - the executable stores the *size* of the `.bss`. – Nikolai Fetissov Nov 03 '10 at 02:04
  • @Nicolai, That's just what I am thinking! The *size* of .bss is stored. – smwikipedia Nov 03 '10 at 02:25
  • I would add to this that if the global is declared as a const it can and/or does show up in .text not as a reference but that is its home. Basically, globals can land in all three, .data, .bss, and .text. Often if uninitialized in the code the program/loader zeros that memory space giving you a zero. I would not rely on that and compilers are now starting to give good warnings about use of variables before being initialized. – old_timer Nov 03 '10 at 05:21
  • @Andrey: he's not talking about OS memory organization, he's talking about the linker's memory map: linkers group data + program memory allocation into segments such as .bss, .data, .text, etc. – Jason S Nov 03 '10 at 12:30
3

In PE files there are two sizes specified for each segment: RAWsize (size on disk) and Vsize (size in RAM).

When Vsize is larger than the RAWsize, the rest of segment in RAM is zeroed.

.bss (if present) always has RAWsize of 0, and the uninialized globlal variables are located there.

Another common approach is to make Vsize of .data larger than its RAWsize, so that the rest of the segment will hold unitialized variables.

ruslik
  • 14,714
  • 1
  • 39
  • 40
  • It's kind of a paradox. Though RAWsize is 0 for .bss segment, .bss segment **actually** occupies space on disk. Otherwise, no clue (I would call it *meta-clue*) could be recorded for .bss segment. – smwikipedia Nov 03 '10 at 02:21
  • 1
    ELF works the same way; normally the `.bss` section is just part of the same segment as `.data`, in the part after the "filesiz", but within the "memsiz" (`readelf -a /bin/ls` for an example: look for a segment with RW permissions and a memsiz greater than filesiz.) – Peter Cordes Jun 26 '22 at 06:35
2

Storage for global variables is allocated in your computer's virtual memory by the OS linker/loader at the time your program is loaded. The actual global variable storage is somewhere in the physical memory hierarchy (cache, RAM memory, SSD/HD backing storage, etc.), as mapped by the cache and VM system. It could all end up quite fragmented.

The values of initialized globals are copied from the .data segment into a portion of the allocated virtual memory. Non-initialized globals might be zeroed, or might have junk left in them, depending on the security of the particular OS under which the program is running.

The are other variations, depending on the language, compiler, language run-time, and OS.

hotpaw2
  • 70,107
  • 14
  • 90
  • 153
  • Thanks for your concise reply. Things are much clearer. So conceptually speaking, the OS loader layout the process's virtual address space according to the process' backing executable file. If so, the **size info** of the uninitialized data must be recorded in the exe file, though its value is left undecided. Am I right? – smwikipedia Nov 03 '10 at 02:17
  • I think I need to check some exe file format such as PE to see if things go like I expected. – smwikipedia Nov 03 '10 at 02:19
  • @smwikipedia, static compiled languages can have the size of the data (if known at compile time) stored somewhere in the binary. But there also exist dynamic and interpreted languages that don't allocate space (and fix up the linkages) until a global variable is actually encountered in execution. – hotpaw2 Nov 03 '10 at 02:26
  • your reply explained *when* the global variables are allocated, that's very helpful to me. – smwikipedia Nov 03 '10 at 02:27
0

Uninitialized variables are simply pointers at the machine level. The space for them is allocated at runtime, and the program will fill it in at some later time.

For instance, if in assembler you create a global variable global BYTE 100 that will reserve global as a pointer to a 100 byte region. The program then has access to that region for whatever it needs.

EDIT: I looked up in my assembler book and it looks like uninitialized globals are defined in the .data section just as initialized variables are. From my understanding the space is allocated in the exe (say 100 bytes as above) but will have undefined contents. On Intel machines in Windows it will be garbage; the program is responsible for initializing it. Hope this helps!

Michael K
  • 3,297
  • 3
  • 25
  • 37