39

I'm looking through this tutorial: http://www.cl.cam.ac.uk/freshers/raspberrypi/tutorials/os/ok01.html

The first line of assembly is:

ldr r0,=0x20200000

the second is:

mov r1,#1

I thought ldr was for loading values from memory into registers. But it seems the = means the 0x20200000 is a value not a memory address. Both lines seem to be loading the absolute values.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Jonathan.
  • 53,997
  • 54
  • 186
  • 290
  • 2
    A relevant [ARM blog post](http://community.arm.com/groups/processors/blog/2010/07/27/how-to-load-constants-in-assembly-for-arm-architecture). – artless noise Jan 29 '14 at 21:21
  • 2
    A [forum thread](https://www.raspberrypi.org/forums/viewtopic.php?&t=16528) asking exactly the same question. – Peter Cordes Jun 26 '16 at 13:04
  • 1
    Minimal runnable examples on QEMU user mode with assertions: https://github.com/cirosantilli/arm-assembly-cheat/blob/e4acee1d9ce86319142e50c0a407ca4db815536d/v7/ldr_magic.S – Ciro Santilli OurBigBook.com Jul 21 '18 at 09:38

3 Answers3

25

It is a trick/shortcut. say for example

ldr r0,=main

what would happen is the assembler would allocate a data word, near the instruction but outside the instruction path

ldr r0,main_addr
...
b somewhere
main_addr: .data main

Now expand that trick to constants/immediates, esp those that cannot fit into a move immediate instruction:

top:
add r1,r2,r3
ldr r0,=0x12345678
eor r1,r2,r3
eor r1,r2,r3
b top

assemble then disassemble

00000000 <top>:
   0:   e0821003    add r1, r2, r3
   4:   e59f0008    ldr r0, [pc, #8]    ; 14 <top+0x14>
   8:   e0221003    eor r1, r2, r3
   c:   e0221003    eor r1, r2, r3
  10:   eafffffa    b   0 <top>
  14:   12345678    eorsne  r5, r4, #125829120  ; 0x7800000

and you see the assembler has added the data word for you and changed the ldr into a pc relative for you.

now if you use an immediate that does fit in a mov instruction, then depending on the assembler perhaps, certainly with the gnu as I am using, it turned it into a mov for me

top:
add r1,r2,r3
ldr r0,=0x12345678
ldr r5,=1
mov r6,#1
eor r1,r2,r3
eor r1,r2,r3
b top


00000000 <top>:
   0:   e0821003    add r1, r2, r3
   4:   e59f0010    ldr r0, [pc, #16]   ; 1c <top+0x1c>
   8:   e3a05001    mov r5, #1
   c:   e3a06001    mov r6, #1
  10:   e0221003    eor r1, r2, r3
  14:   e0221003    eor r1, r2, r3
  18:   eafffff8    b   0 <top>
  1c:   12345678    eorsne  r5, r4, #125829120  ; 0x7800000

So it is basically a typing shortcut, understand that you are giving the assembler the power to find a place to stick the constant, which it usually does a good job, sometimes complains, not sure if I have seen it fail to do it safely. Sometimes you need a .ltorg or .pool in the code to encourage the assembler to find a place.

old_timer
  • 69,149
  • 8
  • 89
  • 168
  • 1
    thanks for your answer, Im very new to assembly. So our answer is a little above me. For `ldr` do you mean the value would be "put" in memory "by the assembler" as data and then loaded from memory when executed, whereas for `mov` the value is actually part of the instruction? And so if the value is too big to fit in the instruction you must use `ldr`. – Jonathan. Dec 26 '12 at 23:40
  • 3
    mov means move a value into a register. Ldr means load a value into a register. str, store is from register to memory. and the =address shortcut makes no sense. (it makes sense to do a ldr =address to put the address in the register then store to put the contents of some register into memory at that address) – old_timer Dec 26 '12 at 23:50
  • 3
    also understand what "too big" means in arm, it means there are more than 8 bits separating the ones. So mov rd,#0x21000000 is perfectly valid, but 0x201 is not. – old_timer Dec 26 '12 at 23:51
  • 1
    again read the arm arm and this should all become obvious. mov rd,0x21000000 might get encoded as mov rd,0x21<<24 – old_timer Dec 26 '12 at 23:52
  • 1
    so is it safe to mix SP-relative LDRs of my own with this shortcut? does the assembler know not to put data between an instruction that uses sp-relative addressing and the address it refers to? – Dmitri Aug 04 '15 at 15:33
  • 1
    Not sure I understand the question, the trick has to do with things that are fixed in text address space relative to the instruction itself. using sp instead of pc means something relative to the sp on the stack not something relative to the instruction, how would you even describe such a thing? for it to work something would have to generate a push to put the thing on the stack then generate access to it. – old_timer Aug 04 '15 at 17:20
  • 1
    the thing doesnt mov into an instruction nicely will have to do this pc relative trick or simply pc relative loading, then push that register on the stack then remember where it is relative to the sp at all times. – old_timer Aug 04 '15 at 17:20
  • 1
    this shortcut/trick is strictly for .text based loading of immediates into registers and the assembler will properly generate the right offset to the thing and place the thing at that offset (if it can find one or it errors out). stack based instructions are not related to this, and any stack based instructions are just instructions in the .text space that the assembler works around just like all other instructions in the .text space. not sure I understand the question – old_timer Aug 04 '15 at 17:22
16

A shorter response, just from someone that is more closer to your level, hope it helps: in ARM, instructions have 32bits. Some bits are used to identify the operation, some for the operands, and, in the case of the MOV instruction, some are available for an immediate value (#1, for example).

As you can see here (page 33), there are only 12 bits available for the immediate value. Instead of using each bit as the number (that ranges from 0 to 2^12-1~4095), the instruction computes the immediate number by rotating right (ROR) the first 8 bits by two times the amount specified in the last 4 bits. That is,immediate = first 8 bits ROR 2*(last four bits).

This way, we can achieve a much wider range of numbers than just 0 to 4095 (see page 34 for a brief summary of possible immediates). Keep in mind, though, that with 12 bits, there are still only 4096 possible values that can be specified.

Just in case that our number cannot be converted into an instruction like the previous one (257 cannot be expressed as 8 bits rotated two times any 4 bits), then, we have to use LDR r0, =257

In this case, the compiler saves the number 257 in memory, close to the program code, so it can be addressed relative to the PC, and loads it from memory, just as dwelch explained in detail.

Note: If you follow that tutorial, then when you try to 'make' with mov r0, #257 you will get an error, and you have to manually try ldr r0, =257.

sifferman
  • 2,955
  • 2
  • 27
  • 37
Fernando
  • 366
  • 3
  • 7
10

As good as the other answers are, I think I might be able to simplify the answer.

ldr = LoaD Register

mov = MOVe

Both effectively do the same thing but in different ways.

The difference is a lot like the difference between

#define CONST 5

and

int CONST = 5;

in C language.

mov is really fast because it has the accompanying value directly stored as a part of the instruction (in the 12 bit format described in the answer above). It has some limitations due to the way it stores the value. Why? Because

  • 12 bits is not sufficient for storing huge numbers like the 32-bit memory addresses.
  • First 8 bits ROR 2 * (Last 4 bits) cannot represent just any number, even in the 12 bit range.

ldr, on the other hand, is versatile (mainly due to compiler optimizations). It works like this (as shown in the disassembled routine)

  • If the value can be represented in 12-bit & First 8 bits ROR 2 * (Last 4 bits) format then the compiler changes it to a mov instruction accompanying the value.

  • Otherwise, the value is kept as a data, loaded into RAM, at a location. And it is loaded into the required register by accessing from memory using offset from the program counter.

I hope it helped.