Variable length instruction sets often provide multiple jump instructions with differently sized displacements to optimize the size of the code. For example, the PDP-11 has both
0004XX BR FOO # branch always, 8 bit displacement
and
000167 XXXXX JMP FOO # jump relative, 16 bit displacement
for relative jumps. For a more recent example, the 8086 has both
EB XX JMP FOO # jump short, 8 bit displacement
and
E8 XX XX JMP FOO # jmp near, 16 bit displacement
available.
An assembler should not burden the programmer with guessing the right kind of jump to use; it should be able to infer the shortest possible encoding such that all jumps reach their targets. This optimisation is even more important when you consider cases where a jump with a big displacement has to be emulated. For example, the 8086 lacks conditional jumps with 16 bit displacements. For code like this:
74 XX JE FOO # jump if equal, 8 bit displacement
the assembler should emit code like this if FOO
is too far away:
75 03 JNE AROUND # jump if not equal
E8 XX XX JMP FOO # jump near, 16 bit displacement
AROUND: ...
Since this code is and takes more than twice as long as a JE
, it should only be emitted if absolutely necessary.
An easy way to do this is to first try to assembly all jumps with the smallest possible displacements. If any jump doesn't fit, the assembler performs another pass where all jumps that didn't fit previously are replaced with longer jumps. This is iterated until all jumps fit. While easy to implement, this algorithm runs in O(n²) time and is a bit hackish.
Is there a better, preferably linear time algorithm to determine the ideal jump instruction?
As outlined in another question, this start small algorithm is not necessarily optimal given suitably advanced constructs. I would like to assume the following niceness constraints:
- conditional inclusion of code and macro expansion has already occurred by the time instructions are selected and cannot be influenced by these decisions
if the size of a data object is affected by the distance between two explicit or implicit labels, the size grows monotonously with distance. For example, it is legal to write
.space foo-bar
if
bar
does not occur beforefoo
.- in violation of the previous constraint, alignment directives may exist