8086 XLAT alternative

Question

XLAT doesn't work in MASM.

What can I use instead to get the same behaviour:

XLAT : Set AL to memory byte DS:[(E)BX + unsigned AL]

What do you mean "I can't substitute XLAT with XLAT: ..."? You haven't shown any example of what you were trying to do instead. — Peter Cordes, Jun 22 '16 at 11:44
What i need is the code to replace xlat with his behaviour. So in this case ADD BX ,AX and then the MOV in AL. Maybe is this but it doesnt work — linux91, Jun 22 '16 at 11:45
I'd be extremely surprised if MASM can't assemble `xlat`. I answered the question only because replacing xlat is typically good for performance anyway. Modern CPUs don't spend transistors on making it fast, so it decodes to 3 uops (on Intel Haswell for example). Since even for code compatible with 8086, we can do better than that inside a loop (by hoisting the zeroing of bx out of a loop), this question is maybe worth answering. — Peter Cordes, Jun 22 '16 at 12:00
I edited the question to ask more clearly what I think you were trying to ask. I left in the extremely suspicious claim that it doesn't work in MASM. The original version didn't have any more details on how it "didn't work", so IDK if it didn't assemble, or didn't do what the OP wanted because they were using it wrong. — Peter Cordes, Jun 22 '16 at 12:11
@linux91 :XLAT doesn't work, or it doesn't work the way you want? It only supports an unsigned byte in `AL` as an offset. It does work as expected. If you want XLAT to take a 16-bit value in AX then you'd have to code it by hand. If you showed us a snippet of code that demonstrates it not working it would be helpful. — Michael Petch, Jun 22 '16 at 15:55

Peter Cordes · Accepted Answer · 2016-06-22T21:12:01.697

6

xlatb is a valid instruction in 16, 32, and 64bit modes. Maybe you need to use the xlatb mnemonic for MASM? The Intel manual suggests that xlatb is the right mnemonic when used with implicit operands, or xlat byte ptr [bx] for the explicit form (where, like movs, the operand is basically just documentation or segment overrides, and implies the operand size.) Another idea is to see what syntax your disassembler uses for the instruction.

However, using something else is usually a good idea, since it's only a win for code-size, not speed, on modern CPUs (3 uops on Intel Haswell for example). There are usually better alternatives (especially in 32 or 64bit code), like using movzx to get a zero-extended value into a register you can use as an index.

In normal code, you could do:

; table in rbx
movzx  eax,  src                 ; or any other way of producing a zero-extended result in rax
movzx  eax, byte ptr [rbx + rax]     ; a movzx load avoids false deps and partial-reg slowdowns

In 8086 code, you could do something like:

; pointer to the table in DI or SI
xor  bx,bx             ; can be hoisted out of a loop, if bh will stay zeroed

mov  bl, src   ; src can be any addressing mode, or the result of a computation

mov  bl, [si + bx]     ; this is the same load that xlat does, your choice of dest

bx is the only register that can be used in 16bit addressing modes that has separately-usable low and high halves (bl/bh). You need a REX prefix (64bit mode only) to use sil / dil. If you wanted to keep the table pointer in bx, like xlatb does, you'd have to zero-extend using a different register and then mov to si or di.

If the table is static, you can of course use not tie up a register, and just use [table + (e/r)bx].

edited Jun 22 '16 at 21:12

answered Jun 22 '16 at 11:43

Peter Cordes

328,167
45
605
847

Although not originally tagged _MASM_ he did say _MASM_ in the question. To conform to his environment you would need `byte ptr` instead of `byte` – Michael Petch Jun 22 '16 at 15:33
2

_XLAT_ or _XLATB_ have been supported as menmonics for decades on _MASM_, so I venture to guess that there is a problem in the OPs understanding of the instruction. – Michael Petch Jun 22 '16 at 15:39
@MichaelPetch: silly me, I wasn't even thinking about MASM syntax while coding. I hadn't originally intended to write code, just suggest movzx, but then it was tagged 8086... – Peter Cordes Jun 22 '16 at 15:39
3

I don't know if the asker tagged this 8086 because he's *actually* writing code to run on that processor, but... XLAT is actually a *huge* performance win on the 8086. It is a single-byte instruction, and with the slow bus and limited cache, anything you can do to reduce code size amounts to a massive, tangible improvement in speed. In a tight loop, we're talking about 50-100 cycles difference, maybe more. Trivia you learn by reading old optimization books. Of course you're right that it makes almost no sense to use XLAT on modern architectures (or really anything since the 386). – Cody Gray - on strike Jun 22 '16 at 16:34
@CodyGray: I knew 8086 bottlenecked on code-fetch, but I didn't realize it was that huge. Optimizing for code-size can make sense in modern code (e.g. in a bootloader), or for codegolf.SE. (I've posted a couple x86 and ARM machine-code code-golf answers. It's kinda fun using one-byte `xchg eax` instructions). – Peter Cordes Jun 22 '16 at 16:58
Every 8086 SO question I've ever seen has been from someone using a simulator, who will probably never write anything that runs on actual 8086 hardware, but it can run natively on a modern CPU in 16bit mode. There are still real 8086 cores being made as microcontrollers, so it's not actually out of the question. Still, I assume that learning 8086 asm is just a mis-step along the way to learning how modern computers work. Learning a coding style like what optimizing compilers produce will make it easier to read compiler output. – Peter Cordes Jun 22 '16 at 16:58
Agreed. I think the historical stuff is super interesting, but it is not an especially good way to learn how modern computers work. I have an XT and a bunch of other classic hardware in my attic, in case I ever find time to tinker with it again. (I collected vintage hardware many years ago, before I ever learned to program, and I always thought it would be cool to go back and apply my current knowledge.) I wouldn't have ever thought to use `xchg` anymore. Even though it's only 1 byte, there's an implicit lock prefix, which makes it *much* slower than the additional byte or 2 of a `mov`. – Cody Gray - on strike Jun 22 '16 at 17:54
1

@CodyGray: Not `xchg r32, r/m32`, the one byte `xchg eax, r32` version that gave us `90 NOP` ([which I assume exists because 8086 had to use ax for sign-extension, mul, and other stuff](http://stackoverflow.com/a/37780205/224132).) See my [GCD in 8 bytes of x86-32 machine code](http://codegolf.stackexchange.com/questions/77270/greatest-common-divisor/77364#77364) and [adler32 in 32 bytes of x86-64 machine code](http://codegolf.stackexchange.com/questions/78896/compute-the-adler-32-checksum/78972#78972) code golf answers. I did an ARM version of the latter, as well as x86-16 :) – Peter Cordes Jun 22 '16 at 18:18
1

Forgot to actually say that `xchg reg,reg` doesn't have an implicit lock prefix. That wouldn't make sense; it's only in the memory-operand form. `xchg reg,reg` (either encoding) is 3 uops on P6/SnB, 2 on Bulldozer. – Peter Cordes Jun 22 '16 at 18:31
3

Just a side note. _XLAT_ with a memory operand isn't entirely used for documentation (or operand size), it is also way you can override the segment (default is _DS_). The address itself doesn't matter but a different segment will result in the appropriate prefix being added (in 16 and 32-bit code). Same goes for the `movs` instructions. In that case the destination segment can't be overridden, but the segment of the source operand can be. `xlat byte ptr es:[si]` should generate 0x26 0xD7 . – Michael Petch Jun 22 '16 at 21:00
@MichaelPetch: right, thanks, forgot to mention segment overrides in the answer. Some assemblers (like NASM) also make it possible to put prefixes on instructions even with the implicit-operand syntax. – Peter Cordes Jun 22 '16 at 21:14

8086 XLAT alternative

1 Answers1