-1

I am experimenting with x86 instructions emulation under java (just for fun) and ran into the problem with "override prefixes" which an instruction may have.

A prefix can change the behavior of an instruction For examle with "operand size override prefix" you can change the size of the operands. 16 bit to 32 bit or vice versa. the problem is now: when the program runs in 16 bit mode all the operations are done with chars (char is 16 bit wide), when the operand size changes to 32 bit, I would like to run the operations with integers. So I have redundant code. My idea is now to implement a byte array operations, for example I could implement an algorithm for addition of two byte-arrays. The advantage here would be: you could simply switch between different modes even in 128 bit and so on. But on the other side an addition of a bytearray may be not very performant as an addition of two integers...

Do you know a better way to do this? What do you think about it?

neoexpert
  • 465
  • 1
  • 10
  • 20

1 Answers1

1

I think you need to model memory as an array of bytes, because x86 supports unaligned loads / stores. You should probably decode instructions into load / ALU / store (where each part is optional, e.g. add eax, ecx only need ALU, not load or store).

You only have to write the code once to make an int32 from 4 bytes, or to store 4 bytes from an int32. Or if Java lets you get an Int reference to an arbitrarily-aligned 4 bytes, then you could use that as a source or destination operand when the operand-size is 32 bits.

If you can write type-generic versions of add, sub, etc., in Java, you can reuse the same code for each operand-size. So you'd have one switch() on the operand-size in the decoder, and dispatch from there to the handler functions for each instruction. If you use a table of pointers (or of Objects with methods), the same object could appear in the 8-bit table and the 32-bit table if it's generic. (unlike div or mul where they use AH:AL for 8-bit but all wider operand sizes use (E|R)DX:(E|R)AX.


BTW, the possible load/store sizes x86 supports are byte/word/dword/qword (x87 and i486 cmpxchg8b) / xmm / ymm / zmm, and 6-byte (segment + 32-bit pointer les or far jmp [mem]). And also 10-byte x87 or segment + 64-bit pointer (e.g. far jmp).

The last two are handled internally as two separate loads, e.g. a 6-byte load isn't guaranteed to be atomic: Why is integer assignment on a naturally aligned variable atomic on x86?. Only power-of-2 sizes up to 8 bytes are guaranteed atomic (with some alignment restrictions).


For more ideas about emulating x86, see some BOCHS design documents, e.g. How Bochs Works Under the Hood. It's an interpreting emulator, no JIT / dynamic recompilation, like you're writing.

It covers some important ideas like lazy flag handling. Some of the ideas there make the emulator's overall design more complex to gain performance, but lazy flags is pretty limited complexity and should help a lot.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847