2

I'm trying to make a simple x86 disassembler (32-bit for now) for learning purposes.

So the intel docs go:

enter image description here

But I find this very confusing.

First of all, the m8-32 operands seem to indicate either ES:(E)DI or DS:(E)SI.
But there's no telling in which situations one or the other would be the case. In some opcodes you have OPCODE m8, m8, in others you have only one operand that's m8, and after checking across multiple, I've come to the conclusion that there's no general rule.

Then there are these others, that are simply described as memory operand in memory, which leave me even more confused. Is there supposed to be a displacement, maybe an absolute address or relative offset? If so what's even the point, since we have moffs and rel?

The ones after make some sense, but is the number after the colon a displacement?
The ampersand ones leave me completely clueless though.

Besides that, there are these m[number][descriptor], which as far as I can see are for FPU? (I haven't been dealing with the 0Fh escaped opcodes yet).

enter image description here

enter image description here

enter image description here

I'm sorry for I'm probably missing something really obvious, as I often do.

Thanks in advance.

TrisT
  • 639
  • 6
  • 19
  • The FPU marches to a very different drummer. That has a lot to do with the way it started, it used to be a different chip (8087), sold separately from the processor. Very different data types, no registers but a stack. It got integrated into the same chip much later, Pentium was the first one that had it guaranteed available. Do keep in mind that it is getting quite irrelevant in modern software development, it has [too many quirks](https://stackoverflow.com/a/14865279/17034) and modern compilers generate SSE code. – Hans Passant Apr 11 '18 at 06:10
  • @HansPassant: x87 uses two of the same data types in memory as SSE/SSE2: IEEE754 single-precision and double-precision float. (m32fp and m64fp). Only if you use the m80fp forms of fld / fstp do you ever get the 10-byte internal format, which is [an IEEE754 extended precision format](https://en.wikipedia.org/wiki/Extended_precision#x86_extended_precision_format). It has more bits than single/double, but works the same except for not using a hidden/implicit top bit of the significant. But x87 as a whole is yucky, and definitely not a nice compiler target with its register stack! – Peter Cordes Apr 11 '18 at 06:25

2 Answers2

4

Normal instructions like add that can use a memory operand also work with registers, so ADD has encodings for add r32, r/m32 and add r/m32, r32. add eax, ecx can use either encoding / opcode (doesn't matter).

That's why m32 (and not r/m32) is usually only an implicit operand for movsd or stosd or other string instructions, and why Intel says they normally use ES:(E)DI or DS:(E)SI.

First of all, the m8-32 operands seem to indicate either ES:(E)DI or DS:(E)SI. But there's no telling in which situations one or the other would be the case.

m32 means a 32-bit memory operand, which can't be a register instead. Look at the entries for specific instructions to see how the operand(s) are specified, (e.g. DS:(E/R)SI is implicit for lodsb/w/d/q), while others might use a ModR/M operand but require it to be memory.

For x87, the extra annotation tells you how the instruction interprets it. e.g. m32fp is a 32-bit IEEE single-precision float (e.g. for fmul or fld), while m32int is a 32-bit integer (e.g. for fimul or fild).


Other than x87, the number just tells you the operand-size. That's all.

Normally memory operands are specified with the usual ModR/M + optional SIB. The only exceptions are implicit addressing modes (like pop rax reading qword [rsp], or the string instructions), or the moffs forms of MOV which skip the ModR/M byte and just use a 16/32/64-bit offset (same size as the address-size).

mov al/ax/eax/rax, [moffs8/16/32/64] (or the store form) is the only instruction that can use a 64-bit absolute address directly, without putting it in a register first.

Note that moffs8 is an 8-bit operand, not an 8-bit immediate address. The address-size attribute of the instruction (default 64-bit in 64-bit mode, overrideable with a 0x67 address-size prefix) determines how many bytes of absolute address follow the opcode.

The assembler will take care of this for you, and use the moffs encoding when it saves code-size for mov eax, [symbol] in 32-bit code. In general, just write addressing modes the normal way ( Referencing the contents of a memory location. (x86 addressing modes)) and let the assembler generate ModR/M bytes, or warn you if you do something illegal (not encodeable) like try to use movsb with different registers.


For more about x86 asm, see the x86 tag wiki. Also, Agner Fog's guides are very good, although he doesn't attempt to cover basic stuff like this. However, reading Agner's guides and seeing what he says about his short examples (a couple instructions long) will help you make sense of how asm works.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I think the question is (too): which of (E)SI or (E)DI is being used? Then I guess it helps the asker to know that SI stands for Source Index (IIRC) and DI for Destination Index, and that one or both of these registers are used implicitly in the only order that makes sense for the instruction. – Rudy Velthuis Apr 11 '18 at 16:48
  • @PeterCordes maybe I wasn't clear enough, but my problem was never with the modrm/sib bytes, or moffs' or rels or whatever, that was easy to understand, it was specifically with the ``m``whatever operands. In the meantime I found this http://ref.x86asm.net/index.html#Instruction-Operand-Codes, and if you can see by the BA, BB, and BD entries, even though the intel docs make them all to be the same name, they're just really not the same thing, they're opcode-implicit. And it's really hard to figure it out, since very little docs say the opcodes actually specify it, just m. How was my q unclear? – TrisT Apr 12 '18 at 12:35
  • @TrisT: mostly your q was just long and parts of it I only skimmed because I was lazy. This answer started as a comment which I decided I should really post as an answer, then it got longer. I didn't find the `stos` entry unclear at all: http://felixcloutier.com/x86/STOS:STOSB:STOSW:STOSD:STOSQ.html look at the OPCODE column: just `AA`, not + anything for operands. And the operand-encoding is all `NA` in that table. And the description is 100% explicit: `For legacy mode, store EAX at address ES:(E)DI; For 64-bit mode store EAX at address RDI or EDI.` No room for specifying different regs. – Peter Cordes Apr 12 '18 at 12:41
  • @TrisT: Or to put it another way: Intel's `m32` terminology isn't even trying to tell you anything about how the operand is encoded. That's potentially instruction-specific. But it's useful as an asm programmer (not someone writing a disassembler) to look at the docs and see `m32`. I know exactly what that means: it's a 32-bit memory operand. So that's the form of the instruction I'm looking for in the docs. – Peter Cordes Apr 12 '18 at 13:15
  • @PeterCordes "useful as an asm programmer (not someone writing a disassember)" Well my question was about disassembly. I now fully understand that it's opcode-dependant, and I've coded it as a specific operand instead of just "mXYZ" (with the help of the link provided in my prev comment), but still, you can see how it can be confusing and not really explanatory enough, since it can be `ES:(E)DI` or `DS:(E)SI`, and unless you hardcode it for each opcode, there's really no way to know from the operand alone. So you can see how the docs (or comments/answers based on them) aren't of much help. – TrisT Apr 20 '18 at 12:21
  • @TrisT: yes, I can see how *that* section of the docs was unhelpful *for you*, but any specific instruction that uses `m32` describes how it encodes its operands, or what they are if it's implicit. I was trying to explain the perspective that the doc writers probably had, and the audience they were aiming for that might have found that text useful. It's just trying to explain the and give some background on symbols that will be used as placeholders in the per-instruction documentation, and isn't trying to say that `m32` is a specific thing as far as instruction encoding. – Peter Cordes Apr 20 '18 at 12:36
0

I've just found that ref.x86asm.net has a "geek" edition of it's tables.

The opcodes are described here.

The geek version is not ambiguous as the coder is.

Still, if someone could direct me to where one would learn this by himself, it would be greatly appreciated. I don't seem to be able to find it in the intel docs, or anywhere else besides x86asm.

Again, I often miss stuff, so in case I find something I will edit.

Hope I could help, have a nice one.

TrisT
  • 639
  • 6
  • 19
  • The relevant bit are the modr/m and the sib bytes which describe an instruction's operands. I wrote up some details about them in [this answer](https://stackoverflow.com/a/42250270/417501). I wrote another potentially interesting answer [here](https://stackoverflow.com/a/45802339/417501). – fuz Apr 11 '18 at 10:38
  • I can also strongly recommend you to read old versions of the CPU manuals as they are often slightly easier to read. – fuz Apr 11 '18 at 10:44
  • @fuz the modrm or sib don't serve a purpose if they don't exist for the opcode in question. They never serve a purpose in regards to my question, which has to do with the ``m`` operand. It would be great tho if you could tell me more about/link me these manuals. – TrisT Apr 12 '18 at 12:20