I'll do this by way of example, and ARM despite the x86 tag, easier to read, etc - functionally the same.
bootstrap
.globl _start
_start:
ldr r0,=__bss_start__
ldr r1,=__bss_end__
mov r2,#0
bss_fill:
cmp r0,r1
beq bss_fill_done
strb r2,[r0],#1
b bss_fill
bss_fill_done:
/* data copy would go here */
bl main
b .
This code might be buggy, definitely inefficient, but here for demonstration purposes.
C code
unsigned int ba;
unsigned int bb;
unsigned int da=5;
unsigned int db=0x12345678;
int main ( void )
{
ba=5;
bb=0x88776655;
return(0);
}
I could use assembly as well, but .bss, .data, etc don't make as much sense in asm as they do in compiled code.
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > ram
__bss_start__ = .;
.bss : { *(.bss*) } > ram
__bss_end__ = .;
__data_start__ = .;
.data : { *(.data*) } > ram
__data_end__ = .;
}
Linker script used.
Result:
Disassembly of section .text:
08000000 <_start>:
8000000: e59f001c ldr r0, [pc, #28] ; 8000024 <bss_fill_done+0x8>
8000004: e59f101c ldr r1, [pc, #28] ; 8000028 <bss_fill_done+0xc>
8000008: e3a02000 mov r2, #0
0800000c <bss_fill>:
800000c: e1500001 cmp r0, r1
8000010: 0a000001 beq 800001c <bss_fill_done>
8000014: e4c02001 strb r2, [r0], #1
8000018: eafffffb b 800000c <bss_fill>
0800001c <bss_fill_done>:
800001c: eb000002 bl 800002c <main>
8000020: eafffffe b 8000020 <bss_fill_done+0x4>
8000024: 08000058 stmdaeq r0, {r3, r4, r6}
8000028: 20000008 andcs r0, r0, r8
0800002c <main>:
800002c: e3a00005 mov r0, #5
8000030: e59f1014 ldr r1, [pc, #20] ; 800004c <main+0x20>
8000034: e59f3014 ldr r3, [pc, #20] ; 8000050 <main+0x24>
8000038: e59f2014 ldr r2, [pc, #20] ; 8000054 <main+0x28>
800003c: e5810000 str r0, [r1]
8000040: e5832000 str r2, [r3]
8000044: e3a00000 mov r0, #0
8000048: e12fff1e bx lr
800004c: 20000004 andcs r0, r0, r4
8000050: 20000000 andcs r0, r0, r0
8000054: 88776655 ldmdahi r7!, {r0, r2, r4, r6, r9, r10, sp, lr}^
Disassembly of section .bss:
20000000 <bb>:
20000000: 00000000 andeq r0, r0, r0
20000004 <ba>:
20000004: 00000000 andeq r0, r0, r0
Disassembly of section .data:
20000008 <db>:
20000008: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
2000000c <da>:
2000000c: 00000005 andeq r0, r0, r5
Clearly at the end you see the storage for the four variables and they are .bss and .data as expected.
but here is the difference that folks are trying to explain.
There should be code to zero the .bss and that is a waste of cycles yes, and some compilers are starting to warn about using uninitialized variables, and that is good, but either way .bss has some code to zero. .data also might have some code to copy I didn't complete this example to show how that works, you tell the linker script that the .data is in ram but put a copy in rom and have both addresses and sizes/ends the rom data start and a ram data start and you copy from rom to ram.
So the difference in cost of .data vs .bss is for .data you have memory allocated and either through the operating system loader or your own boot strap that data might need to be copied an additional time, might not.
20000008 <db>:
20000008: 12345678
for .bss
20000000 <bb>:
20000000: 00000000 andeq r0, r0, r0
Again the os loader and/or how you build (in this case putting .data after .bss and having at least one .data item if you were to objcopy -O binary this you would get zeroed data in the .bin and not need to fill that .bss data, depends on the loader and destination).
So the storage is equal, but the extra cost for .bss is
800002c: e3a00005 mov r0, #5
8000030: e59f1014 ldr r1, [pc, #20] ; 800004c <main+0x20>
800003c: e5810000 str r0, [r1]
800004c: 20000004
and
8000034: e59f3014 ldr r3, [pc, #20] ; 8000050 <main+0x24>
8000038: e59f2014 ldr r2, [pc, #20] ; 8000054 <main+0x28>
8000040: e5832000 str r2, [r3]
8000050: 20000000
8000054: 88776655
the first one requires an instruction to put the 5 in a register, an instruction to get the address and a memory cycle to store 5 in memory. The second is more costly as it takes an instruction with a memory cycle to get the data then one to get the address then the store, all of them being memory cycles.
Another answer here has tried to argue that you don't have a static cost because they are immediates but the thing about variable length instruction sets is those immediates are there and are read from memory just like fixed length, its not a separate memory cycle it is part of the prefetching but it is still static storage. The difference is you have at least one memory cycle to store the value in memory (.bss and .data imply global so the store to memory is required). Because these are linked the address to the variables needs to be put in place by the linker, in this case with a fixed length risc instruction set that is a pool nearby, for cisc like x86 that would be embedded in a mov immediate to register, either way static storage for the address and static storage for the value, x86 vs arm the x86 would use fewer bytes of instructions to perform the task in two instructions, arm three instructions three separate memory cycles. Functionally the same.
Now where this can save you, by violating expectations but being in complete control (bare metal).
.globl _start
_start:
ldr sp,=0x20002000
bl main
b .
unsigned int ba;
unsigned int bb;
int main ( void )
{
ba=5;
bb=0x88776655;
return(0);
}
MEMORY
{
rom : ORIGIN = 0x08000000, LENGTH = 0x1000
ram : ORIGIN = 0x20000000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > rom
.rodata : { *(.rodata*) } > ram
.bss : { *(.bss*) } > ram
}
Disassembly of section .text:
08000000 <_start>:
8000000: e59fd004 ldr sp, [pc, #4] ; 800000c <_start+0xc>
8000004: eb000001 bl 8000010 <main>
8000008: eafffffe b 8000008 <_start+0x8>
800000c: 20002000 andcs r2, r0, r0
08000010 <main>:
8000010: e3a00005 mov r0, #5
8000014: e59f1014 ldr r1, [pc, #20] ; 8000030 <main+0x20>
8000018: e59f3014 ldr r3, [pc, #20] ; 8000034 <main+0x24>
800001c: e59f2014 ldr r2, [pc, #20] ; 8000038 <main+0x28>
8000020: e5810000 str r0, [r1]
8000024: e5832000 str r2, [r3]
8000028: e3a00000 mov r0, #0
800002c: e12fff1e bx lr
8000030: 20000004 andcs r0, r0, r4
8000034: 20000000 andcs r0, r0, r0
8000038: 88776655 ldmdahi r7!, {r0, r2, r4, r6, r9, r10, sp, lr}^
Disassembly of section .bss:
20000000 <bb>:
20000000: 00000000 andeq r0, r0, r0
20000004 <ba>:
20000004: 00000000 andeq r0, r0, r0
(I think I deleted the stack init in the prior example)
There was no need to complicate the (toolchain specific) linker script, no need to initialize any of the memory in the bootstrap, instead init the variables in the code, it is more costly as far as .text space goes, but easier to write and maintain. easier to port if the need arises, etc. But breaks known rules/assumptions if someone wants to take that code and add a .data item or assume a .bss item is zeroed.
Another shortcut, say Raspberry Pi bare metal:
.globl _start
_start:
ldr sp,=0x8000
bl main
b .
unsigned int ba;
unsigned int bb;
unsigned int da=5;
int main ( void )
{
return(0);
}
MEMORY
{
ram : ORIGIN = 0x00008000, LENGTH = 0x1000
}
SECTIONS
{
.text : { *(.text*) } > ram
.rodata : { *(.rodata*) } > ram
.bss : { *(.bss*) } > ram
.data : { *(.data*) } > ram
}
Disassembly of section .text:
00008000 <_start>:
8000: e3a0d902 mov sp, #32768 ; 0x8000
8004: eb000000 bl 800c <main>
8008: eafffffe b 8008 <_start+0x8>
0000800c <main>:
800c: e3a00000 mov r0, #0
8010: e12fff1e bx lr
Disassembly of section .bss:
00008014 <bb>:
8014: 00000000 andeq r0, r0, r0
00008018 <ba>:
8018: 00000000 andeq r0, r0, r0
Disassembly of section .data:
0000801c <da>:
801c: 00000005 andeq r0, r0, r5
hexdump -C so.bin
00000000 02 d9 a0 e3 00 00 00 eb fe ff ff ea 00 00 a0 e3 |................|
00000010 1e ff 2f e1 00 00 00 00 00 00 00 00 05 00 00 00 |../.............|
00000020
the existence of a .data item and .data being defined after .bss in the linker script and the binary is copied by the GPU into ram for us as a whole .text,.bss,.data, etc. the zeroing of .bss was a freebie we didnt need to add additional code for .bss nor if we have more .data and are using it we got a free init/copy of .data as well.
These are corner cases, but do demonstrate the kinds of things you were thinking about why zero a variable that I can just change or will end up changing in .text later. Which I extend to why burn boot time zeroing that section in the first place, why complicate the linker script, gnu linker scripts are nasty and painful at best, have to be very careful to get them right, granted once you get them right then not too much work each rev of toolchain items to see if it still work.
To do it correctly, .bss costs you instructions and execution time of those instructions including the separate memory bus cycle(s). But there should be linker script and bootstrap there code no matter what for .bss. Likewise for .data but unless rom/flash based it is likely that the source and destination for .data is the same the copy happened in the loader (operating system copying the binary from rom/flash/disk to memory) and doesn't need an additional copy unless you force it in the linker script.
Well based on comments in other questions "correctly" lets say based on assumptions, the .data items need to show up as defined in the compiled code, what you find for .bss has historically been toolchain specific, what the spec says I would have to look up and what version for what toolchain you might end up using as despite popular belief not all toolchains that are in use today are in constant maintenance to comply with the standard that is in place this second. Some folks have the luxury of limiting their projects to those that have up to date tools, many don't.
The shortcuts shown here are similar to hand tuned assembly vs just taking what the compiler provides, you are on your own and it can be risky if you are not careful, but you can get a decent performance gain on boot doing something like that, if that is something desired/required for your project. Would not use anything like that for non-specialized work.
Also note you are well into the don't use global variables religious debate with this discussion as well. If you don't use globals then you still deal with local globals as I call them, or in other words local static variables which fall into this category.
unsigned int more_fun ( unsigned int, unsigned int );
void fun ( unsigned int x )
{
static int ba;
static int da=0x12345678;
ba+=x;
da=more_fun(ba,da);
}
int main ( void )
{
return(0);
}
0000800c <fun>:
800c: e59f2028 ldr r2, [pc, #40] ; 803c <fun+0x30>
8010: e5923000 ldr r3, [r2]
8014: e92d4010 push {r4, lr}
8018: e59f4020 ldr r4, [pc, #32] ; 8040 <fun+0x34>
801c: e0803003 add r3, r0, r3
8020: e5941000 ldr r1, [r4]
8024: e1a00003 mov r0, r3
8028: e5823000 str r3, [r2]
802c: ebfffff6 bl 800c <fun>
8030: e5840000 str r0, [r4]
8034: e8bd4010 pop {r4, lr}
8038: e12fff1e bx lr
803c: 0000804c andeq r8, r0, r12, asr #32
8040: 00008050 andeq r8, r0, r0, asr r0
00008044 <main>:
8044: e3a00000 mov r0, #0
8048: e12fff1e bx lr
Disassembly of section .bss:
0000804c <ba.3666>:
804c: 00000000 andeq r0, r0, r0
Disassembly of section .data:
00008050 <da.3667>:
8050: 12345678 eorsne r5, r4, #120, 12 ; 0x7800000
Being local static or local globals they still land in .data or .bss.