ARM Assembly TBB Instruction - how does jumping work?

Question

so I'm trying to understand how TBB works for switch statements in Assembly. I see how it's written in the textbook/online manual, but I don't understand how the offsets work in the branch table. How does it go from branch table to instruction? How are labels subtracted to get the correct offset, and why divide by 2?

In my textbook it says

The memory address of the instruction to which the program should branch is calculated as follows:
target = PC + 4 + ( 2*BranchTable[r0] )

Where r0 is the counter containing the offset within the branch table. At TBB, PC already points to the next instruction (branch table, which is PC = PC+4). The branch table supposedly loads a second offset to the proper instruction (2*BranchTable[r0]). From what I've seen in my textbook and online, the branch table label is subtracted from the instruction label. This should give an offset of something like 4n. Why divide by 2? Thank you!

Edit: Did some math

So I did some math and it turns out that the offset in the branch table is [ (label2 - label1)/2]. Plugged into the equation earlier gives target = PC + 4 + (label 2 - label). This makes sense, but I'd still like to know if anyone has the reason why, or if my logic is wrong - why is TBB set up to divide by 2/multiply the difference by 2?

Because instruction addresses are always even, it makes sense to scale by two so you get double the range. Another way to look at it is that you store the bits #1-8 of the offset, and bit #0 is implicitly zero, i.e. `XXXXXXXX0` in binary. — Jester, Nov 10 '17 at 22:43
That makes sense. I remember reading that PC alway has bits [1:0] set to 0 because of alignment (can't remember if that's 16 bit or 32 bit instructions...). Is this also why DCB instructions are usually followed by ALIGN ? Sorry, could you clarify a little bit about the range part? — Fua, Nov 10 '17 at 22:54
That's in 32 bit. As for `DCB`, yes. The manual says: _"If DCB is followed by an instruction, use an ALIGN directive to ensure that the instruction is aligned."_ (`DCB` is a directive, not an instruction). The range that you can use is 512 bytes not 256 as you would have if you naively used a byte offset. — Jester, Nov 10 '17 at 22:57
And by range you mean just how far the TBB instruction can jump to from the current PC? Why would the range need to be doubled? From what I can see, the difference in addresses is added directly to PC. I think there's something I'm missing that's not letting me grasp this idea. — Fua, Nov 10 '17 at 23:39
It doesn't need to be doubled, it's a benefit you get by not storing the least significant bit. — Jester, Nov 10 '17 at 23:39
The processor can jump further assuming even addresses because it uses all 8 bits shifted left once. In other words, because it knows that it cannot jump to an odd address, it just places an extra "0" at the least significant bit and forms a 9-bit jump address instead. If the addresses were 32-bit aligned, it could get to 10-bits addresses with 2 bits inserted low instead. Why do this? Because relative jumps are much cheaper than full ones. — Michael Dorgan, Nov 11 '17 at 01:22
as mentioned you only need to specify the number of instructions not bytes so you can double your range. the plus 4 is two instructions ahead so we assume that the program counter when we execute the instruction is pointing two ahead, so pc of instruction + 4 + (immediate*2), and without looking it up I dont know if it is unsigned (can only jump forward) or signed (can jump forward or back). if only forward that again doubles the range. — old_timer, Nov 11 '17 at 16:27
if the immediate space in the instruction were 5 bits lets say and it was a signed byte offset then you could jump +15 or -16 bytes (Actually +14 because it would have to be even) so only 7 instructions ahead or 8 back from where the pc is at the time. If we choose by design to define those 5 bits as instructions but still signed then that makes it 15 instructions forward or 16 back, we can jump further. if we make it unsigned and instructions then up to 31 instructions forward, doubling the distance we can jump. there is a balance here, if the distance is not very far then. — old_timer, Nov 11 '17 at 16:30
you have to put a trampoline in somewhere within range, this jump to some other jump that can get further so to get to your final destination would put more work on the asm programmer or compiler, but at the same time too many bits and the instruction space, number of opcodes, etc can get sqeezed...Im talking in general, you will see some other instruction sets provide a smaller local branch flavor of things and an absolute far flavor of things. fixed-ish length instruction sets like arm you have to work harder on the balance. — old_timer, Nov 11 '17 at 16:32
Thank you for all of your explanations! I was studying NVIC and to access the proper enable register I had to do something similar - find the correct register number and multiply by 4 to get the proper memory address. It makes sense to do this if the difference were the number of instructions from the BranchTable. But if the address are what's being subtracted and those are already aligned, then isn't it pointless to divide by 2 and multiply by 2? If TBB were at 0x0~0, the BranchTable at 0x0~4, and the first CaseLabel at 0x0~8. Wouldn't (0x0~8 - 0x0~4) = 0x0~4 already be aligned? — Fua, Nov 12 '17 at 17:24
I get the feeling that if I just place values 1, 2, 3, ... the number of instructions ahead to the Labels in the BranchTable without dividing, that I would get the same result? — Fua, Nov 12 '17 at 17:25

Peter Cordes · Answer 1 · 2017-11-16T02:58:08.527

TBB / TBH are only available in Thumb2 mode.

ARM Thumb2 instructions always start on an even address. Being able to branch to PC+4 + {0, 2, 4, 6, 8, ..., 508, 510} is more useful than
PC+4 + {0, 1, 2, 3, 4, ..., 254, 255 }, because all the odd offsets are useless. As @Jester explained in the first comment, multiplying by 2 gives you twice the range from one-byte offsets with no loss in flexibility.

ARM's designers could have designed it to always multiply by 4, but 512B is generally enough range, and it would sometimes require padding the code blocks to be multiples of 4 bytes long. If larger offsets are needed, TBH uses half-word 16-bit offsets (still multiplied by 2).

Here's a real example of a jump table using TBB, with the raw hex and commented disassembly working out the math of how each branch target was reached (like
case 3: (0x3164 + 0x9 * 2)): Confused by TBB in a section of ARM disassembly

ARM Assembly TBB Instruction - how does jumping work?

1 Answers1