5

For example:

In the file demo.c,

#inlcude<stdio.h>
int a = 5;
int main(){
  int b=5;
  int c=a;
  printf("%d", b+c);
  return 0;
}

For int a = 5, does the compiler translate this into something like store 0x5 at the virtual memory address, for example, Ox0000000f in the const area so that for int c = a, it is translated to something like movl 0x0000000f %eax?

Then for int b = 5, the number 5 is not put into the const area, but translated directly to a immediate in the assembly instruction like mov $0x5 %ebx.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Gab是好人
  • 1,976
  • 1
  • 25
  • 39
  • 5
    Since your program has no observable effects, the compiler is entirely free to not store anything. ([Example](https://godbolt.org/g/5DAwmj)) – Kerrek SB Apr 22 '16 at 21:19
  • 1
    Leaving aside the particularities of your question, const data is in general stored in its own section e.g. ".rodata" (ro for Read Only). For POD types like `int`, the memory values are directly written to that section of the executable file, which the OS loads in memory when starting it. There is no instruction required to initialize the memory in such a case -- it starts at the right value this way! Global (non-const) data typically appears in its own section too, e.g. '.data' or '.bss' (.bss is implicitly initialized to zero at program startup, again by the OS). – Cameron Apr 22 '16 at 21:23
  • 2
    The details depend very much on your platform. Some ISAs allow immediate values in instructions, so values could be part of some instruction; other values may simply be part of the program image and are referred to by address. Other values yet (like zero) may not require any storage at all and instead are provided through special loading mechanics. – Kerrek SB Apr 22 '16 at 21:24
  • iostream?! For printf?! – Kerrek SB Apr 22 '16 at 21:24
  • 1
    Assigments to a, b and c still do not have observable effects past printf line. I expect any compiler worth using to just call `printf` with argument "10" – Revolver_Ocelot Apr 22 '16 at 21:26
  • 1
    [modified example](https://godbolt.org/g/6G1MqR) – Kerrek SB Apr 22 '16 at 21:28
  • @KerrekSB Cool! Thanks for the link! – Gab是好人 Apr 22 '16 at 21:30

2 Answers2

1

It depends. Your program has several constants:

int a = 5;

This is a "static" initialization (which occurs when the program text and data is loaded before running). The value is stored in the memory reserved by a which is in a read-write data "program section". If something changes a, the value 5 is lost.

int b=5;

This is a local variable with limited scope (only by main()). The storage could well be a CPU register or a location on the stack. The instructions generated for most architectures will place the value 5 in an instruction as "immediate data", for an x86 example:

mov   eax, 5

The ability for instructions to hold arbitrary constants is limited. Small constants are supported by most CPU instructions. "Large" constants are not usually directly supported. In that case the compiler would store the constant in memory and load it instead. For example,

       .psect  rodata
k1     dd      3141592653
       .psect  code
       mov     eax  k1

The ARM family has a powerful design for loading most constants directly: any 8-bit constant value can be rotated any even number of times. See this page 2-25.

One not-as-obvious but totally different item is in the statement:

printf("%d", b+c);

The string %d is, by modern C semantics, a constant array of three char. Most modern implementations will store it in read-only memory so that attempts to change it will cause a SEGFAULT, which is a low level CPU error which usually causes the program to instantly abort.

       .psect  rodata
s1     db      '%', 'd', 0
       .psect  code
       mov     eax  s1
       push    eax
Gab是好人
  • 1,976
  • 1
  • 25
  • 39
wallyk
  • 56,922
  • 16
  • 83
  • 148
  • which is in a read-write data "program section" , so it is in the .data section? – Gab是好人 Apr 22 '16 at 21:46
  • @Gab: I used the semantics from a 1980s Intel assembler. There was nothing named `.data`. Instead there were psects named `rodata`, `rwdata`, and `stack`. Those would be grouped into a "segment" named `data`. – wallyk Apr 22 '16 at 21:49
  • In my example, for`int b=5;`, we can say that the "data" 5 is sort of stored directly in the instruction as an immediate number, right? So this "5" appears in the code segment? – Gab是好人 Apr 22 '16 at 21:52
  • 1
    @Gab: yes, it is in the instruction stream. The `mov eax, immediate` instruction has some bits within it reserved for the constant value. – wallyk Apr 22 '16 at 21:59
1

In OP's program, a is an "initialized" "global". I expect that it is placed in the initialized part of the data segment. See https://en.wikipedia.org/wiki/File:Program_memory_layout.pdf, http://www.cs.uleth.ca/~holzmann/C/system/memorylayout.gif (from more info on Memory layout of an executable program (process)). The location of a is decided by the compiler- linker duo.

On the other hand, being automatic (stack) variables, b and c are expected in the stack segment.

Being said that, the compiler/linker has the liberty to perform any optimization as long as the observed behavior is not violated (What exactly is the "as-if" rule?). For example, if a is never referenced, then it may be optimized out completely.

Community
  • 1
  • 1
Arun
  • 19,750
  • 10
  • 51
  • 60