4

As an intro to learning Arm assembly, I'm trying to recreate disassembled functions in a higher level language. However I'm confused by the following bit of assembly:

0000315e        2101    movs    r1, #1
00003160    e8dff000    tbb [pc, r0]
00003164        030e    lsls    r6, r1, #12
00003166        0907    lsrs    r7, r0, #4
00003168        050b    lsls    r3, r1, #20
0000316a        2106    movs    r1, #6
0000316c        e008    b.n 0x3180
0000316e        2102    movs    r1, #2
00003170        e006    b.n 0x3180
00003172        2103    movs    r1, #3
00003174        e004    b.n 0x3180
00003176        2104    movs    r1, #4
00003178        e002    b.n 0x3180
0000317a        2105    movs    r1, #5
0000317c        e000    b.n 0x3180
0000317e        2100    movs    r1, #0
00003180        4608    mov r0, r1
00003182        4770    bx  lr

I believe it may be some kind of switch statement but I'm unsure to what exactly it's doing

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Johnathon
  • 253
  • 1
  • 3
  • 6
  • See also: http://stackoverflow.com/questions/6538342/help-with-68k-assembly-jump-tables/6540369#6540369 for the 68k version of a switch statement. – ninjalj Aug 12 '11 at 00:46

1 Answers1

10

Yes, that is a switch. tbb stands for Table Branch Byte, it takes a table of byte-offsets, with base at pc, and index at r0, and uses that to do a branch.

So:

0000315e        2101    movs    r1, #1           ; ret = default value
00003160    e8dff000    tbb [pc, r0]             ; switch (r0)

; jump table, byte-sized offsets
00003164        03 0e 09 07 05 0b

; case 1: (0x3164 + 0x3 * 2)
0000316a        2106    movs    r1, #6           ; ret = 6
0000316c        e008    b.n 0x3180               ; break

; case 5: (0x3164 + 0x5 * 2)
0000316e        2102    movs    r1, #2           ; ret = 2
00003170        e006    b.n 0x3180               ; break

; case 2: (0x3164 + 0x7 * 2)
00003172        2103    movs    r1, #3
00003174        e004    b.n 0x3180

; case 3: (0x3164 + 0x9 * 2)
00003176        2104    movs    r1, #4
00003178        e002    b.n 0x3180

; case 4: (0x3164 + 0xb * 2)
0000317a        2105    movs    r1, #5
0000317c        e000    b.n 0x3180

; default:
0000317e        2100    movs    r1, #0

; case 0: (0x3164 + 0xe * 2)
: end switch
00003180        4608    mov r0, r1        ; mov ret to r0 (return value)
00003182        4770    bx  lr            ; return

The basic idea should be clear.

Matthew Slattery
  • 45,290
  • 8
  • 103
  • 119
ninjalj
  • 42,493
  • 9
  • 106
  • 148
  • 2
    My guess would be that there's a range check above the code that was posted, which branches to 0x317e if r0 >= 6... – Matthew Slattery Aug 12 '11 at 00:17
  • Thanks, that makes sense but I don't understand where you got all that from. I've never seen "tbb" before and don't understand how you got from what I posted to what you have. And there is a range check that branched to ox317e if r0 > 5 – Johnathon Aug 12 '11 at 00:23
  • Very well done. Nice job on the formatting / commenting / spacing & showing the relationship between the table of offsets and each case. – Dan Aug 12 '11 at 00:28
  • @Johnathon - [the TBB instruction](http://infocenter.arm.com/help/topic/com.arm.doc.dui0204f/Cjafifbd.html) was added in the Thumb-2 instruction set. I think ninjalj's explanation is pretty deliberate & well-commented. Can you be more explicit & tell us what you don't understand? – Dan Aug 12 '11 at 00:32
  • Firstly what the lsls command does. It seems pretty much emitted in what was posted. – Johnathon Aug 12 '11 at 00:35
  • @Johnathon: that `lsls` is not really there. That part is data (the jump table), not code, so you should read it as data, not as code. – ninjalj Aug 12 '11 at 00:37
  • Ok, so the bit I don't get is the three lines with lsls, lsrs and lsls. What do they stand for and how should I interpret it? And I know I should know this but what do the values in the second column of the disassembly represent? – Johnathon Aug 12 '11 at 00:39
  • And regarding the jump table: The PC holds the base address of the table [3164], R0 is used to index into it, the value is fetched, multiplied by 2, added to the table base [3164], and loaded into the PC. Example, if R0 holds 3, we index 4 bytes into the table, fetch 7, mult by 2, add the 14 to x3164, and branch to x3172. Just as ninjalj described. – Dan Aug 12 '11 at 00:41
  • 1
    @Johnathon: The values in column 2 are the actual bytes in memory. The problem as ninjalj stated is that they're not instructions, they're data (ARM does that sometimes), but the disassembler is trying to decode them as instructions. Imagine if the disassembler tried to disassemble the bytes in "Hello, sir, how are you today?" - it would output *something*, but it wouldn't be meaningful, because those characters aren't ARM assembly instructions. – Dan Aug 12 '11 at 00:43
  • @Johnathon: the disassembler treats everything in the code segment as code, even when it isn't. The three columns in the disassembly are: offset, machine code, disassembled instructions. So at 0x3164 you have the 6 bytes 03 0e 09 07 05 0b, which the disassembler treats as code, and gets a bogus disassembly for that. – ninjalj Aug 12 '11 at 00:45
  • Ok, I think I get it now. Few more questions though. How are you meant to know when the disassembler is trying to decode data as instructions (I though lsrs is logical shift right). And running it through IDA shows different offsets for the jump table (0E 03 07 09 0B 05). What you've posted makes more sense but what IDA has outputted is the result I'm expecting – Johnathon Aug 12 '11 at 00:52
  • 1) by being smarter than the disassembler, 2) ah, it's little endian, it makes more sense. – ninjalj Aug 12 '11 at 05:47