Question about global variables in ARM Assembly

Question

My book has this picture of the memory of what happens when we have an program written for the ARM processor:

They say that the text segment is the machinecode, and it also contain the literal pool. They also say that the global data segment contains the global variables.

But from another example I thought that global variables where created using the literal pool, so that the value of a literal was the adress to the global variable?

Does these two things contradict each other?(one says that the global variables is in the global data segment, another is that we use the literal pool in the text segment) Or have I misunderstood something?

No, they're compatible. The *addresses* of global variables are stored in memory in literal pools near code. Along with other read-only data (like string literals and `const` arrays in `.rodata`) being basically part of the text section. Static addresses don't change, so they can go in with read-only text. You still need some read/write storage somewhere else (in `.data` or `.bss`) to point to. — Peter Cordes, Aug 09 '22 at 18:05
@PeterCordes Thank you. Are you saying that the adress of the variable is stored in the text segment(literal pool), but the value itself is stored in the global data segment? — user394334, Aug 09 '22 at 18:13
Yes, and that's what your second paragraph after the diagram says, too. e.g. see the example of using it at the bottom of an answer on [What is the difference between =label (equals sign) and \[label\] (brackets) in ARMv6 assembly?]([What is the difference between =label (equals sign) and \[label\] (brackets) in ARMv6 assembly?](https://stackoverflow.com/a/17215118)) Or [What is the difference between loading data using the = operator or from the Literal Pool?](https://stackoverflow.com/q/68596384) shows some examples of asm. — Peter Cordes, Aug 09 '22 at 18:14
[ARM assembly access to C global variable](https://stackoverflow.com/q/20366004) shows code to get the address of a global variable, then access the data there. — Peter Cordes, Aug 09 '22 at 18:26

score 0 · Answer 1 · answered Aug 09 '22 at 19:42

You, the programmer, ultimately decide how the memory is divided up. Not the tools, with some exceptions definitely not ARM.

unsigned int x;
unsigned int y=5;

void fun ( void )
{
    x=3;
    y++;
}

arm-none-eabi-gcc -O2 -c so.c -o so.o
arm-none-eabi-objdump -D so.o

so.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <fun>:
   0:   e3a00003    mov r0, #3
   4:   e59f2014    ldr r2, [pc, #20]   ; 20 <fun+0x20>
   8:   e5923000    ldr r3, [r2]
   c:   e59f1010    ldr r1, [pc, #16]   ; 24 <fun+0x24>
  10:   e2833001    add r3, r3, #1
  14:   e5823000    str r3, [r2]
  18:   e5810000    str r0, [r1]
  1c:   e12fff1e    bx  lr
    ...

Disassembly of section .data:

00000000 <y>:
   0:   00000005    andeq   r0, r0, r5

Disassembly of section .bss:

00000000 <x>:
   0:   00000000    andeq   r0, r0, r0

That is at the object level.

Then if we link

MEMORY
{
    one   : ORIGIN = 0x00000000, LENGTH = 0x1000
    two   : ORIGIN = 0x20000000, LENGTH = 0x1000
    three : ORIGIN = 0x20002000, LENGTH = 0x1000
}
SECTIONS
{
    .text   : { *(.text*)   } > one
    .rodata : { *(.rodata*) } > one
    .bss    : { *(.bss*)    } > two
    .data   : { *(.data*)   } > three
}

arm-none-eabi-ld -T flash.ld so.o -o so.elf
arm-none-eabi-objdump -D so.elf

so.elf:     file format elf32-littlearm


Disassembly of section .text:

00000000 <fun>:
   0:   e3a00003    mov r0, #3
   4:   e59f2014    ldr r2, [pc, #20]   ; 20 <fun+0x20>
   8:   e5923000    ldr r3, [r2]
   c:   e59f1010    ldr r1, [pc, #16]   ; 24 <fun+0x24>
  10:   e2833001    add r3, r3, #1
  14:   e5823000    str r3, [r2]
  18:   e5810000    str r0, [r1]
  1c:   e12fff1e    bx  lr
  20:   20002000    andcs   r2, r0, r0
  24:   20000000    andcs   r0, r0, r0

Disassembly of section .bss:

20000000 <x>:
20000000:   00000000    andeq   r0, r0, r0

Disassembly of section .data:

20002000 <y>:
20002000:   00000005    andeq   r0, r0, r5

A picture like the one you are looking at the .data and .bss are your global data. .data being initialized and .bss being uninitialized.

This of course is not a usable binary there is no exception table there is no bootstrap, etc. But this also shows that the tools just do what you tell them. You are ultimately responsible.

this is the pool here

  20:   20002000    andcs   r2, r0, r0
  24:   20000000    andcs   r0, r0, r0

I used the disassembler so it tries to disassemble the address, ignore the andcs stuff. You can see how the code is generated so that later during linking the linker can connect the code to the address where these data items live.

So as covered in the comments under the question the variable is in the global data area and the pool points at it. The pool can be used for other reasons too. Any time the code needs more "immediate" type data. If this were x86 or some other variable length instructions then these type of values may be part of the "instruction" as the instruction if you look at it from the view that it has its own pool.

unsigned int fun ( void )
{
    return(0x12345678);
}
00000000 <fun>:
   0:   e59f0000    ldr r0, [pc]    ; 8 <fun+0x8>
   4:   e12fff1e    bx  lr
   8:   12345678    eorsne  r5, r4, #120, 12    ; 0x7800000

This instruction set cannot have 32 bits of immediate, so it could build up with four instructions or two with newer arm instructions. Or do a load from the pool.

To demonstrate that comment, not that x86 matters here.

0000000000000000 : 0: c7 05 00 00 00 00 03 movl $0x3,0x0(%rip) # a <fun+0xa> 7: 00 00 00 a: 83 05 00 00 00 00 01 addl $0x1,0x0(%rip) # 11 <fun+0x11> 11: c3 retq

0000000000000000 : 0: b8 78 56 34 12 mov $0x12345678,%eax 5: c3 retq

Disassembly of section .text:

0000000000000000 <fun>:
   0:   c7 05 f6 ff ff 1f 03    movl   $0x3,0x1ffffff6(%rip)        # 20000000 <x>
   7:   00 00 00 
   a:   83 05 ef 1f 00 20 01    addl   $0x1,0x20001fef(%rip)        # 20002000 <y>
  11:   c3                      retq   

Disassembly of section .bss:

0000000020000000 <x>:
    20000000:   00 00                   add    %al,(%rax)
    ...

Disassembly of section .data:

0000000020002000 <y>:
    20002000:   05                      .byte 0x5
    20002001:   00 00                   add    %al,(%rax)
    ...

And the linker modifies the instruction itself and not a pool area. I am sure there are, and sure I can generate, some pool space for x86 as well.

Also notice that the same linker script used on x86, tells x86 what I want it to do.

End of the day you are in control of the consumption of memory at this level. There are some hardware rules like where the chip comes out of reset and how, and in some cases like cortex-m some chunks of memory are for flash/code some are for data, some are for peripherals. You ultimately should know this and design your memory map accordingly. Most of the time folks just take tools from someone else set up for the target in question, and it all just magically happens.

Just like when you use your compiler for your host windows/linux/etc computer that toolchain is configured to make binaries for that target processor and operating system. Same source code compiled for linux is a different result than for windows, same toolchain/version, built for the different target.

Question about global variables in ARM Assembly

1 Answers1