3

In my quest of writing a small disassembler for linux specific to x86 arch, I'm faced with a small issue. It's with regard to mandatory prefixes and repeat prefixes. Looking at the Intel docs [1], it's said that repeat prefixes are 0xf2 or 0xf3, and mandatory prefixes are 0x66, 0xf2 or 0xf3.

There are two instructions which have the following base opcodes:

crc32 -- f2 0f 38 f0 (Here, 0xf2 is a mandatory prefix)
movbe -- 0f 38 f0

So, the opcodes of a 'movbe' instruction which has to repeat as long as the counter register is non-zero should be:

repnz movbe == f2 0f 38 f0

When I start disassembling an instruction, if I see the byte 0xf2, how do I know that it's a mandatory prefix for the crc32 instruction but not a repeat prefix for the movbe instruction, or vice-versa? Which instruction do I match the opcode pattern "f2 0f 38 f0" to?

What am I missing?

[1] http://www.intel.com/design/intarch/manuals/243191.HTM

Thanks and Regards,
Hrishikesh Murali

Hrishikesh Murali
  • 535
  • 3
  • 7
  • 16
  • As far as x86 disassemblers go, the one included with [qemu](http://git.savannah.gnu.org/cgit/qemu.git/tree/i386-dis.c) is fairly readable. It uses the method suggested by @Karel for handling prefixes. – user786653 Nov 17 '11 at 07:31

2 Answers2

5

You can use the repeat prefixes only with string instructions (see the manual). "f2 0f 38 f0" is always CRC32 instruction.

MazeGen
  • 180
  • 3
  • 14
  • Yes, I missed that. Thank you! :-) – Hrishikesh Murali Nov 17 '11 at 06:57
  • 1
    Use lazy evaluation: do not try to assume the meaning of a prefix immediately, just store it as a raw prefix (i.e. just remember "I've got 0xF2 prefix" instead of "I've got REPNE prefix that may be also mandatory prefix"). Once you've got the primary opcode, you've got the context and you can decode prefixes the right way. – MazeGen Nov 17 '11 at 07:07
  • Yeah, sounds like an easy way to handle prefixes. Thanks for the tip. – Hrishikesh Murali Nov 17 '11 at 07:13
2

MOVBE, (move to/from big-endian in memory), is not an instruction repeatable through a REP((N)E) prefix.

Only string instructions are repeatable that way. Those are: MOVS*, LODS*, STOS*, SCAS*, CMPS*, INS*, OUTS*, where * is either of B, W, D or Q (except INS* and OUTS*, which only go up to double words, not quad words).

Intel's manual entry for rep/rep(n)e explains that.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Alexey Frunze
  • 61,140
  • 12
  • 83
  • 180
  • Oh, okay. So I have to have a special check in my disassembler for string instructions then. Thanks, I did not know this. – Hrishikesh Murali Nov 17 '11 at 06:56
  • +1 "REPNE/REPNZ prefix is encoded using F2H. **Repeat-Not-Zero prefix applies only to string and input/output instructions**." (From Vol. 2A 2.1.1) – user786653 Nov 17 '11 at 06:56
  • @HrishikeshMurali: yep, beware that 66h is normally an operand size prefix, so you'll have to have a similar check there too. – Alexey Frunze Nov 17 '11 at 07:02
  • @Alex: A better way would be as mentioned by Karel Lejska below. It's better to evaluate the prefix after I encounter the primary opcode, makes stuff easier. – Hrishikesh Murali Nov 17 '11 at 07:15
  • @HrishikeshMurali: that's right. In fact, that's exactly what I did some time ago. – Alexey Frunze Nov 17 '11 at 07:25