0

I'm investigating a little bit on how Objective-C language is mapped into Assembl. I've started from a tutorial found at iOS Assembly Tutorial.

The code snippet under analysis is the following.

void fooFunction() {
    int add = addFunction(12, 34);
    printf("add = %i", add);
}

It is translated into

 _fooFunction:
@ 1:
    push    {r7, lr}
@ 2:
    movs    r0, #12
    movs    r1, #34
@ 3:
    mov r7, sp
@ 4:
    bl  _addFunction
@ 5:
    mov r1, r0
@ 6:
    movw    r0, :lower16:(L_.str-(LPC1_0+4))
    movt    r0, :upper16:(L_.str-(LPC1_0+4))
LPC1_0:
    add r0, pc
@ 7:
    blx _printf
@ 8:
    pop {r7, pc}

About the assembly code, I cannot understand the following two points

-> Comment @1

The author says that push decrements the stack by 8 byte since r7 and lr are of 4byte each. Ok. But he also says that the two values are stored with the one instruction. What does it mean?

-> Comment @6

movw    r0, :lower16:(L_.str-(LPC1_0+4))
movt    r0, :upper16:(L_.str-(LPC1_0+4))

The author says the that r0 will hold the address of the "add = %i" (that can be find in the data segment) but I don't really get how the memory layout looks like. Why does he represent the difference L_.str-(LPC1_0+4) with the dotted black line and not with red one (drawn by me).

enter image description here

Any clarifications will be appreciated.

Edit

I'm missing the concept of pushing r7 onto the stack. What does mean to push that value and what does it contain?

Lorenzo B
  • 33,216
  • 24
  • 116
  • 190

2 Answers2

1

But he also says that the two values are stored with the one instruction. What does it mean?

That the single push instruction will put both values onto the stack.

Why does he represent the difference L_.str-(LPC1_0+4)

Because the add r0, pc implicitly adds 4 bytes more. To quote the instruction set reference:

Add an immediate constant to the value from sp or pc, and place the result into a low register.
Syntax: ADD Rd, Rp, #expr
where:
Rd   is the destination register. Rd mustbe in the range r0-r7.
Rp   is either sp or pc.
expr is an expression that evaluates (at assembly time) to a multiple of 4 in the range 0-1020.

If Rp is the pc, the value used is: (the address of the current instruction + 4) AND &FFFFFFFC.
Jester
  • 56,577
  • 4
  • 81
  • 125
  • Hi Jester. Thank you first your quick response. About the 1 question. `push` means that the values contained into `r7` and 'lr' are pushed into the stack. And what about `pop`? What it means to pop back to `r7` and `pc`? About the second, are you able to expand it? What it means *if Rp is the pc*? Thanks. – Lorenzo B Feb 20 '15 at 16:43
  • `pop` is the opposite of `push` of course, the top two values will be removed from the stack and placed into `r7` and `lr`. Not sure what clarification you need? As for the second, the actual [reference page](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0068b/CIHEDHIF.html) talks about the instruction in terms of `ADD Rd, Rp, #expr` where `Rp` can be `sp` or `pc`. I just quoted the relevant `pc` case for you. – Jester Feb 20 '15 at 17:00
  • Jester, sorry for the delay. I added an edit about the clarification I need. But maybe I just missing the concept of pushing a value onto the stack. – Lorenzo B Feb 20 '15 at 23:20
  • Are you able to suggest me some tuts on assembly language? Thank you for you support. – Lorenzo B Feb 20 '15 at 23:21
  • 1
    The reason for pushing `r7` on the stack is so that its value is saved and can be restored before returning. The reason for saving `lr` is because that contains the address to return to, and that is why it is popped into `pc` at the end. – Jester Feb 20 '15 at 23:27
  • Ok. The reason for saving `lr` is clear to me. You temporary store the value on to the stack in order to retrieve it later and to later execute the next instruction. But `r7` why? Is it containing something in particular at the moment the `push` is executed? – Lorenzo B Feb 20 '15 at 23:32
  • We don't know what it contains. The calling convention might mandate its value be preserved for the caller, aka. _callee-saved register_. See also [this question](http://stackoverflow.com/questions/261419/arm-to-c-calling-convention-registers-to-save). – Jester Feb 20 '15 at 23:47
  • 2
    r7 is typically used as the frame pointer in Thumb code, where it will be preserved along with the return address as part of the function prologue. That appears to be the case here - see `mov r7, sp` near the start of the function (although this particular function is simple enough to make no use of it) – Notlikethat Feb 21 '15 at 13:30
  • @Notlikethat Thanks for your comment. With *frame pointer*, do you mean stack pointer? – Lorenzo B Feb 22 '15 at 14:48
  • @Jester sorry to ask a new question. The memory layout I drew in the picture represents the memory layout for the executable code (i.e. the text segment). First question: Is this a valid assumption? Based on your reply, I have not very clear about this memory representation. In particular, `LPCI_0` represents the base address for the `add` command. Is this true? – Lorenzo B Feb 22 '15 at 15:00
  • @Jester `LPCI_0 +4` represents the base address for the box with dots. Is it ok? Finally, `L._str` is the base address for the string. Is this ok? Based on that, why the difference is represented by the black dotted line and does not arrive directly `movt r0, :upper16:(L_.str-(LPC1_0+4))`? Thank you – Lorenzo B Feb 22 '15 at 15:02
  • Yes, `LPC1_0` is the address of the `add r0, pc` since it's a label on that instruction. `LPC1_0+4` is not the next instruction, because in thumb mode instructions are 2 bytes, so it's 2 instructions away not 1. In arm mode, the offset would be `+8` and that would still be 2 instructions away. This is probably due to pipelining. – Jester Feb 22 '15 at 15:06
  • 1
    @flexaddicted A [frame pointer](https://en.wikipedia.org/w/index.php?title=Frame_pointer) isn't quite the same thing as the stack pointer. The PC offset is [just a historical quirk of the architecture](http://stackoverflow.com/q/24091566/3156750). – Notlikethat Feb 22 '15 at 15:21
  • @Jester In other words. The part I don't really understand is the `+4` for the difference `L_.str-(LPC1_0+4)`. As you said, `add r0, pc` implicitly adds 4 bytes more. Even if there are four more bytes, why should they them into accounts if I need to find the bytes needed to reach the string. The base address for `LPC1` should be sufficient to retrieve it? I'm missing something crucial. Thanks for all. – Lorenzo B Feb 22 '15 at 15:31
  • 1
    Because it adds `+4` implicitly you need to cancel that out by subtracting. Note that the `+4` in the expression is inside the parentheses so that's actually a `-4`: `L_.str-(LPC1_0+4) = L_.str-LPC1_0-4` which cancels the implicit `+4`. – Jester Feb 22 '15 at 15:33
  • @Jester Maybe I got it. So, when you mean *adds implicitly add 4 bytes* means that if the string (`L._str` in this case) has an address of 44 (I'm using decimal to represent an address), it has been shifted to 40. Is this correct? About comment @6, the `add ro, pc` adds the program counter to the address of the string (previously retrieved. What should contain `pc` at this point? The address of what? – Lorenzo B Feb 22 '15 at 16:32
  • Logically `pc` contains `LPC1_0` there, but due to how the cpu is made it will actually use `LPC1_0 + 4`. That's where the `+4` comes from. The address of the string is not shifted anywhere. – Jester Feb 22 '15 at 16:47
0

For comment 1: The two values pushed to the stack are the values store in r7 and lr.

Two 4 byte values equals 8 bytes.

For comment 6: The label LPC1_0 is followed by the instruction

add r0, pc

which adds another 4 bytes to the difference between the two addresses.

stonesam92
  • 4,307
  • 2
  • 19
  • 24