Understanding Cortex-M assembly LDR with pc offset

Question

I'm looking at the disassembly code for this piece of C code:

#define GPIO_PORTF_DATA_R       (*((volatile unsigned long *)0x400253FC))
int main(void){    
    // Initialization code
    while(1) {
        SW1 = GPIO_PORTF_DATA_R&0x10;  // Read PF4 into SW1
        // Other code
        SW2 = GPIO_PORTF_DATA_R&0x01;
    }
}

The assembly for that SW1= line is (sorry can't copy code):

https://i.stack.imgur.com/knsUg.jpg

Here are my questions:

At the first line, PC = 0x00000A56, and PC + 92 = 0x00000AB2, which is not equal to 0x00000AB4, the number shown. Why?

I did a bit of research on SO and found out that PC actually points to the Next Next instruction to be executed.

When pc is used for reading there is an 8-byte offset in ARM mode and 4-byte offset in Thumb mode.

However 0x00000AB4 - 0x00000A56 = 0x5E = 94, neither does it match 92+8 or 92+4. Where did I get wrong?

Reference:

Strange behaviour of ldr [pc, #value]

Why does the ARM PC register point to the instruction after the next one to be executed?

LDR Rd,-Label vs LDR Rd,[PC+Offset]

Hi @500-InternalServerError thanks but Keil shows me that PC is actually pointing to *this* address, not the next instruction: https://imgur.com/WTyVdvc — Nicholas Humphrey, Dec 27 '21 at 04:25
Please do not post pictures of code. If you cannot copy/paste the code, manually transcribe it from the pictures. — fuz, Dec 27 '21 at 10:28
((0xA56+4)&0xFFFFFFFC)+(0x17<<2) = 0xAB4 straight from the arm documentation. No guessing or assumptions. — old_timer, Dec 27 '21 at 11:42

score 2 · Answer 1 · edited Jan 21 '22 at 21:41

2

From ARM documentation:

Operation 
  address = (PC[31:2] << 2) + (immed_8 * 4) 
  Rd = Memory[address, 4]

The pc is 0xA56+4 because of two instructions ahead and this is thumb so 4 bytes.

(0xA5A>>2)<<2 + (0x17*4)
or
(0x00000A5A&0xFFFFFFFC) + (0x17<<2)
0xA58+92=0xA64

This is an LDR so it is a word-based address ideally. Because the thumb instruction can be on a non-word aligned address, you start off by adding two instructions of course (thumb2 complicates this but add four for thumb). Then zero the lower two bits (LDR) the offset is in words so need to convert that to bytes, times four. This makes the encoding make more sense if you think about each part of it. In arm mode the PC is already word aligned so that step is not required (and in arm mode you have more bits for the immediate so it is byte-based not word-based), making the offset encoding between arm and thumb possibly confusing.

The various documents will show the math in different ways but it is the same math nevertheless. The PC is the only confusing part, especially for thumb. For ARM you add 8, two ahead, for thumb it is basically 4 because the execution cannot tell if there is a thumb2 coming, and it would break a great many things if they had attempted that. So add 4 for the two ahead, for thumb. Since thumb is compressed they do not use a byte offset but instead a word offset giving 4 times the range. Likewise this and/or other instructions can only look forward not back so unsigned offset. This is why you will get alignment errors when assembling things in thumb that in arm would just be unaligned (and you get what you get there depending on architecture and settings). Thumb cannot encode any address for an instruction like this.

For understanding instruction encoding, in particular pc based addressing, it is best to go back to the early ARM ARM (before the armv5 one but if not then just get the armv5 one) as well as the armv6-m and armv7-m and full sized armv7-ar. And look at the pseudo-code for each. The older one generally has the best pseudo-code, but sometimes they leave out the masking of lower bits of the address. No document is perfect, they have bugs just like everything else. Naturally the architecture tied to the core you are using is the official document for the IP the chip vendor used (even down to the specific version of the TRM as these can vary in incompatible ways from one to the next). But if that document is not perfectly clear you can sometimes get an idea from others that, upon inspection, have compatible instructions, architectural features.

edited Jan 21 '22 at 21:41

halfer

19,824
17
99
186

answered Dec 27 '21 at 11:31

old_timer

69,149
8
89
168

1

for arm the instruction address is already word aligned so it would not see the alignment you see in the thumb pc-relative LDR. If you were used to the ARM version this one is confusing until you see this alignment. With the offset being word based, that is a clue that the PC derived address must also be word aligned. – old_timer Dec 27 '21 at 11:32
2

Note I may have even stated it wrong but you are not adding two instructions ahead to the pc to get the pc you are adding two instructions ahead to the ADDRESS OF THE INSTRUCTION, to GET THE PC. Think using those terms and it also helps understand the encoding and execution (even though most of us will say add four or eight to the pc to get the ... umm pc. – old_timer Dec 27 '21 at 11:50
I was surprised to read that even with address/offset from registers (`ldr r0, [r1, r2]`, it's still not well-defined to do an unaligned word load in Thumb mode. The manual google found (https://developer.arm.com/documentation/dui0068/b/Thumb-Instruction-Reference/Thumb-memory-access-instructions/LDR-and-STR--register-offset) says an unaligned load will "corrupt" the destination register, and an unaligned word store will "corrupt" the aligned word containing the byte address. I guess that means pure-thumb cores don't need misalignment handling, unlike ARM-mode where it's defined, IIRC? – Peter Cordes Dec 27 '21 at 14:22
1

you have to be careful with those generic descriptions like the one you posted, they kind of average the history of the instruction across all time. If one core had an issue in the arm7tdmi they thrown in a comment that might be interpreted as global. unaligned worked from even the arm7tdmi days (armv4t) it was strange, the bytes rotate rather than going into the next word, 0x1001 address of 0x11223344 would give you 0x44112233 for example not 0xZZ112233 with zz being from the next word over – old_timer Dec 27 '21 at 16:44
but it was not officially documented – old_timer Dec 27 '21 at 16:44
ldr , [,]. Operation address = Rn+Rm if address[1:0] = 0xb00. data= memory[address,4]. else data=UNPREDICTABLE. Rd=data – old_timer Dec 27 '21 at 16:49
also remember they are an ip company and esp in the early days, but even so now, they had to protect that ip. a number of the UNPREDICTABLEs in the armv4 days were actually predictable and a way to test to see if you had stolen IP. no doubt they had bugs too in some impementations and had to write that as well to cover those implementations. like the ldr rx,[rx] with the same register and the multiply using the same register in source and destination (which have both always just worked for me). – old_timer Dec 27 '21 at 16:50
the ldr rd,[rn,rm] I would expect to work on a cortex-m in a predictable way. I would expect the issues were with the older full sized cores. – old_timer Dec 27 '21 at 16:53
the ARMv6m ARM does not mention an unpredictable for the ldr rd,[rn,rm] – old_timer Dec 27 '21 at 16:56

Peter Cordes · Accepted Answer · 2021-12-27T14:17:39.603

You missed a key part of the rules for Thumb mode, quoted in one of the question you linked (Why does the ARM PC register point to the instruction after the next one to be executed?):

For all other instructions that use labels, the value of the PC is the address of the current instruction plus 4 bytes, with bit[1] of the result cleared to 0 to make it word-aligned.

(0xA56 + 4) & -4 = 0xA58 is the location that PC-relative things are relative to during execution of that ldr r0, [PC, #92]
((0xA56 + 4) & -4) + 92 = 0xab4, the location the disassembler calculated.
It's equivalent to do 0xA56 & -4 = 0xa54 then +4 + 92, because +4 doesn't modify bit #1; you can think of clearing it before or after adding that +4. But you can't clear the bit after adding the PC-relative offset; that can be unaligned for other instructions like ldrb. (Thumb-mode ldr encodes an offset in words to make better use of the limited number of bits, so the scaled offset and thus the final load address always have bits[1:0] clear.)

(Thanks to Raymond Chen for spotting this; I had also missed it initially!)

Also note that your debugger shows you a PC value when stopped at a breakpoint, but that's the address of the instruction you're stopped at. (Because that's how ARM exceptions work, I assume, saving the actual instruction to return to, not some offset.) During execution of the instruction, PC-relative stuff follows different rules. And the debugger doesn't "cook" this value to show what PC will be during its execution.

The rule is not "relative to the end of this / start of next instruction". Answers and comments stating that rule happen to get the right answer in this case, but would get the wrong answer in other Thumb cases like in LDR Rd,-Label vs LDR Rd,[PC+Offset] where the PC-relative load instruction happens to start at a 4-byte aligned address so bit #1 of PC is already cleared.

Your LDR is at address 0xA56 where bit #1 is set, so the rounding down has an effect. And your ldr instruction used a 2-byte encoding, not a Thumb2 32-bit instruction like you might need for a larger offset. Both of these things means round-down + 4 happens to be the address of the next instruction, rather than 2 instruction later or the middle of this instruction.

score -1 · Answer 3 · answered Dec 27 '21 at 03:56

-1

Since the program counter points to the next instruction, when it executes the LDR at address 0x00000A56, the program counter will be holding the address of the next instruction, which is 0x00000A58.

0x0A58 + 0x5C (decimal 92) == 0x00000AB4

answered Dec 27 '21 at 03:56

bennyE31

7
2

Thanks, but I'm not sure if it's the case though. On Keil, when it's preparing to execute this instruction, PC says 0x00000a56 instead of 0x00000a58. https://imgur.com/WTyVdvc – Nicholas Humphrey Dec 27 '21 at 04:24
1

The address that Keil is pointing to is what is in the program counter BEFORE the instruction is executed. The very first step of a CPU's instruction cycle is to fetch the instruction at the address in the program counter AND THEN increment the program counter. So, if you are stepping through the lines of assembly, when you execute the instruction at 0xA56 the program counter is incremented BEFORE the LDR operation occurs. https://en.wikipedia.org/wiki/Instruction_cycle – bennyE31 Dec 27 '21 at 04:33
2

@NicholasHumphrey: ARM's PC points 2 instructions later during the execution of an instruction, not the next. [Why does the ARM PC register point to the instruction after the next one to be executed?](https://stackoverflow.com/q/24091566) - actually, it's address of current instruction plus 4 bytes in thumb mode, plus 8 in ARM mode. So in thumb2 code, it doesn't depend on use of variable-length instructions. – Peter Cordes Dec 27 '21 at 04:38
1

@PeterCordes thanks I actually read that post. The problem is it seems to be +2 bytes instead of 4 or 8. The address of LDR is 0xA56, since its Thumb (each instruction is 2 bytes) we increment 4 bytes for PC, which is 0xA5A. We then add 92 on top of it which gives 0xAB6, not 0xAB4. I hope I didn't make any mistake. – Nicholas Humphrey Dec 27 '21 at 04:46
@PeterCordes BTW I might be wrong that it could be ARM not THUMB code (I don't exactly know how to tell one from the other just from the asm code). However neither case really works. For PC = 0xA56, from what you said, either I should increment 4 or 8 bytes, and then add 92, both results are bigger than 0x00000AB4. I also checked that it is indeed fetched from 0x00000AB4 (in 0xAB4 there is 0x53FC and in 0xAB6 there is 0x4002), so the double word 0x400253FC is indeed correctly loaded. I just don't get how the address is calculated. – Nicholas Humphrey Dec 27 '21 at 04:50
1

@NicholasHumphrey: ARM mode has a fixed 4-byte instruction width, so your 2-byte LDR is definitely Thumb. And yeah, I'd expect `0xA56 + 4 + 92` = `0xab6` based on the rules in the top answer on that other question. The ARM-mode example on the other answer there does work out: `0x83a0 + 8 + 28` = `0x83c4` as shown in the `objdump` disassembly. – Peter Cordes Dec 27 '21 at 05:03
1

@NicholasHumphrey: Maybe that linked Q&A is wrong about the rules for Thumb mode? You can't trust that the value of PC during execution of the instruction is the same as the PC value shown *before* you single-step it. But you should be able to trust that the address of the start of the instruction shown in your debugger is right, and what's meant by "address of the instruction". (I've read your question in more detail now, and yeah you had already found and linked that answer. Your results seem to be inconsistent with it, so it seems to be wrong about Thumb mode.) – Peter Cordes Dec 27 '21 at 05:10
1

Certainly Benny's answer here is disagreeing with it, saying that PC during the execution of a Thumb instruction (for the purposes of PC-relative addressing) is the address of the start of the next instruction, i.e. the end of this one. (That does explain the math.) – Peter Cordes Dec 27 '21 at 05:11
@PeterCordes One thing that I found in a book says that how many bytes of offsets we need to add on top of PC depends on which instruction is in the execute stage of the pipeline. I think this might be related (i.e. maybe the NEXT instruction is in that stage). I'll modify the original post to include more C code. – Nicholas Humphrey Dec 27 '21 at 05:24
1

OTOH, the disassembly in [LDR Rd,-Label vs LDR Rd,\[PC+Offset\]](https://stackoverflow.com/q/69778585) shows `0x6c + 4 + 0x20` = `0x90` to reach the literal pool where the `&counter2` pointer is stored. So that does match the +4 quoted behaviour. (The next several instructions there are all 2 bytes long, vs. yours has an AND that's 4 bytes. So maybe that linked Q&A is only about Thumb1, not Thumb2 where there can be 32-bit instructions in Thumb mode? Either that or tools disagree about the symbolic disassembly like `[PC, #20]`, i.e. what asm-source offsets mean) – Peter Cordes Dec 27 '21 at 05:25
@PeterCordes I'm not particularly familiar with the op code. I'll see what I can find. Thanks for the help along the way. – Nicholas Humphrey Dec 27 '21 at 05:28
2

@PeterCordes in Thumb mode, PC is rounded down to the nearest multiple of 4 before adding 4 and the encoded offset. It looks like this particular disassembler includes the +4 with the offset, but does not include the additional offset implied by the rounding. – Raymond Chen Dec 27 '21 at 05:57
1

@RaymondChen: oh derp, I misread `bit[1]` as being the low bit, and somehow was thinking of x86 2-byte words when talking about word-aligned. I was thinking it was talking about something related to instructions after a `bx` to an odd address so didn't actually read or think through the details. Anyway, total brain fart on my part, all makes sense now that you draw my attention again to what it actually says, and this answer is right for this case only by coincidence. – Peter Cordes Dec 27 '21 at 06:11
1

@NicholasHumphrey: See Raymond's comment. And note that your `0x00000A56` rounds down to `0x00000A54`, so PC-relative stuff during execution of that LDR is relative to `A54 + 4 = A58`. – Peter Cordes Dec 27 '21 at 06:12
1

even with thumb2 you add 4 bytes (two traditional thumb instructions ahead) to the address of the instruction. for an ldr where you have a word based offset you need to be word aligned so the arm documentation will take the pc+4 just shown as PC because for the instruction operation part of the document you are supposed to know this. then it will show the lower two bits being stripped. it is definitely not sometimes 2 sometimes 4 it is always 4 with the lower two bits zeroed, either using pc[31:2}*4 or pc&0xFFFFFFFC nomenclature. – old_timer Dec 27 '21 at 11:46
Thanks @PeterCordes RaymondChen old_timer I think I understand now, it's the alignment rule. Jeez that's some revelation :D – Nicholas Humphrey Dec 27 '21 at 13:34
@NicholasHumphrey: You should accept one of the answers, then, if either of them fully answers your question. – Peter Cordes Dec 27 '21 at 14:11

Understanding Cortex-M assembly LDR with pc offset

3 Answers3

Linked

Related