7

When encode instructioncmpw %ax -5 for x86-64, from Intel-instruction-set-reference-manual, I have two opcodes to choose:

3D iw CMP AX, imm16 I Valid Valid Compare imm16 with AX.
83 /7 ib CMP r/m16, imm8 MI Valid Valid Compare imm8 with r/m16.

So there will be two encoding results:

66 3d fb ff ; this for opcode 3d
66 83 f8 fb ; this for opcode 83

Then which one is better?

I tried some online-disassembler below

Both can disassemble to origin instruction. But why 6683fb00 also works and 663dfb doesn't.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
Steve
  • 205
  • 1
  • 10
  • There can be no "better" unless you say what's important to you. I can think of three dimensions: code size, execution speed, and compatibility/portability. Size seems to be the same, so it's not better there. There's probably more. What do you want to achieve? – unwind Jun 03 '16 at 10:06
  • 2
    Without looking into this too far, one instruction seems to compare AX (a 16-bit register) with a 16-bit value, whereas the other compares a different (16-bit) register with an 8-bit value. – Neil Jun 03 '16 at 10:07
  • In the second variant, the prefix isn't length-changing. – harold Jun 03 '16 at 10:11
  • @unwind maybe execution speed is more important for me since I am working on generate object file in a compiler directly. – Steve Jun 03 '16 at 10:17
  • @IraBaxter From the perspective of assembler, which should choose? – Steve Jun 03 '16 at 10:18
  • @harold so the second is better? I think gas and llvm will choose the second one. – Steve Jun 03 '16 at 10:20
  • 5
    In this case don't use a 16-bit immediate value if you don't have to. There is quite a penalty for the Length Changing prefix in 64-bit code. The [Intel optimization manual](http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf) has a rule to avoid an LCP stall like this: _Assembly/Compiler Coding Rule 21. (MH impact, MH generality) Favor generating code using imm8 or imm32 values instead of imm16 values._ – Michael Petch Jun 03 '16 at 11:49
  • 1
    @Neil that doesn't matter if he's using -5 as the operand though – harold Jun 03 '16 at 12:37
  • @MichaelPetch thanks. I will refer to the manual. – Steve Jun 06 '16 at 07:20

1 Answers1

7

Both encodings are the same length, so that doesn't help us decide.

However, as @Michael Petch commented, the imm16 encoding will cause an LCP stall in the decoders on Intel CPUs. (Because without the 66 operand-size prefix, it would be 3D imm32, so the operand-size prefix changes the length of the rest of the instruction. This is why it's called a Length-Changing-Prefix stall. AFAIK, you'd get the same stall in 16bit code for using a 32bit immediate.)

The imm8 encoding doesn't cause a problem on any microarchitecture I know of, so favour it. See Agner Fog's microarch.pdf, and other links from the tag wiki.

It can be worth using a longer instruction to avoid an LCP stall. (e.g. if you know the upper 16 bits of the register are zero or sign-extended, using 32bit operand size can avoid the LCP stall.)

Intel SnB-family CPUs have a uop cache, so instructions don't always have to be re-decoded before executing. Still, the uop cache is small, so it's worth it.

Of course, if you're tuning for AMD, then this isn't a factor. I forget if Atom and Silvermont decoders also have LCP stalls.


Re: part2:

663d is prefix+opcode for cmp ax, imm16. 663dfb doesn't "work" because it consumes the first byte of the following instruction. When the decoder see 66 3D, it grabs the next 2 bytes from the instruction stream as the immediate.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847