4

Does the x86 standard include Mnemonics or does it just define the opcodes?

If it does not include them, is there another standard for the different assemblers?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
mame98
  • 1,271
  • 12
  • 26
  • The cpu only cares about the machine code. That said, in everyday use there is only intel and at&t flavors for the mnemonics and the latter (with some exceptions) is mostly just adding size suffixes if needed. Arguably y86 is an alternate subset of mnemonics for the same machine code. – Jester Jan 25 '19 at 17:00
  • It's more of a convention. For example, _Intel_ mnemonics and _AT&T_ mnemonics slightly differ. On _Intel_, a `MOV` is always a `MOV`. With _AT&T_, it can be `MOVL`, `MOVQ` and so on indicating the data size in the mnemonic itself. – zx485 Jan 25 '19 at 17:02
  • Intel has a standard. Microsoft MASM is close to that standard, and includes extensions. ATT syntax reverses the order of source and destination operands, perhaps it was a port of an existing assembler. – rcgldr Jan 25 '19 at 17:03
  • 1
    @rcgldr Yes! In fact, as I [explained elsewhere](https://stackoverflow.com/a/42250270/417501), AT&T syntax was designed to look like PDP-11 assembly. – fuz Jan 25 '19 at 17:22
  • 2
    Intel documents their x86 instructions using particular mnemonics, but it doesn't enforce them as a standard across 3rd party assembler tools. An assembler can use whatever mnemonics it wants to create x86 executable code. An assembler could name move, add, and subtract with `moe`, `larry`, and `curly` if they want, although it may be more difficult to read the code. For clarity, most x86 assemblers stick pretty close to the Intel suggested mnemonics. – lurker Jan 25 '19 at 20:26
  • assembly in general not specific to one target, assume there are no standards as there is no way to prevent differences. All that matters is that the machine code conforms to the target, the assembler, the tool that turns the assembly language into machine code can use whatever syntax/language it wants so long as it does the job. it is all up to the author. – old_timer Feb 01 '19 at 02:36
  • saying that you will find very often that the syntax for most targets, x86, arm, mips, etc vary across assemblers, but the are more close than different with respect to the mnemonics itself, the differences are more often with the rest of the language, label vs label: ; comment vs @ coment SECTION TEXT vs .text, etc and countless others. but you will see with some instructions the mnemonics or other versions of that line will vary and not just the at&t vs intel thing... – old_timer Feb 01 '19 at 02:38
  • 1
    @lurker Incidentally, SUN named a bunch of its ELF support tools `lari`, `crle`, and `moe` – fuz Aug 02 '20 at 14:39
  • @fuz that's brilliant! :) – lurker Aug 02 '20 at 15:02

2 Answers2

11

Mnemonics are not standardised and different assemblers use different mnemonics. Some examples:

  • AT&T-style assemblers apply b, w, l, and q suffixes to all mnemonics to indicate operand size. Intel-style assemblers typically indicate this with the keywords byte, word, dword, and qword
  • AT&T-style assemblers recognise cbtw, cwtl, cltq, and cqto while Intel-style assemblers recognise the same instructions as cbw, cwd, cdq, and cqo
  • AT&T-style assemblers recognise movz?? and movs?? where ?? are two size suffixes for what Intel-style assemblers call movzx, movsx, and movsxd
  • some Intel-style assemblers only recognise 63 /r as movsxd while others recognise movsx as a variant of this instruction, too
  • Plan 9-style assemblers (such as used in Go) are just plain weird and differ in a whole lot of ways, such as using Motorola-style mnemonics for conditional jumps
  • historically, the NEC assembler provided for the NEC V20 clone of the 8086 came with almost entirely different mnemonics. For example, int was called brk.
Sep Roland
  • 33,889
  • 7
  • 43
  • 76
fuz
  • 88,405
  • 25
  • 200
  • 352
  • AT&T also created a new mnemonic in case of MOVABS. And if you're talking about other architectures then things will be even wierder. For example some platforms write XOR as EOR, and there are a lot of variants of shift instructions like SAR, SRA, ASR, SHL, SLL... – phuclv Jan 26 '19 at 07:13
  • @phuclv There is no point in comparing mnemonics between architectures, really. – fuz Jan 26 '19 at 13:24
  • I said that because you mentioned Plan 9 or NEC V20 architectures – phuclv Jan 26 '19 at 15:36
  • @phuclv The NEC V20 is an upgraded 8086, so it's not really a different architecture. Plan 9 is not an architecture but rather an operating system. – fuz Jan 26 '19 at 17:51
  • @fuz Please verify if the 4th bullet point is still correct after my edit. Maybe you meant "some assemblers **and** Intel-style assemblers ..." – Sep Roland Jan 31 '19 at 22:49
  • What you meant to say must have been that AT&T doesn't follow the standard... It's called Intel syntax for a reason you know.. It's in their manuals. – Christoffer Bubach Aug 02 '20 at 13:34
  • @ChristofferBubach Check out the 8086 datasheet. You'll find that it doesn't actually specify any assembly syntax at all. The Intel assembly syntax was only described for Intel's own toolchain, but back in the day it was common that every toolchain vendor cooked up its own syntax. – fuz Aug 02 '20 at 14:35
  • Yeah the mnemonic's are all there? Data sheet for 8086 @ https://course.ece.cmu.edu/~ece740/f11/lib/exe/fetch.php?media=wiki:8086-datasheet.pdf – Christoffer Bubach May 27 '21 at 23:55
3

There unfortunately isn't really a single "x86 standard" written down on paper that defines all the minimum requirements that a CPU must meet to be an x86.

Intel's documentation comes very close to being the "x86 standard", but in some cases gives stronger guarantees on things than you get on modern AMD CPUs. e.g. Intel guarantees atomicity of a 1/2/4/8-byte load or store from/to cacheable memory with any alignment that doesn't cross a cache-line boundary. But AMD only guarantees it for cacheable loads/stores that don't cross an 8-byte boundary.

Why is integer assignment on a naturally aligned variable atomic on x86? quotes Intel's manual, showing that all of the guarantees are given as "Intel486 processor (and newer processors since)" guarantees such and such. There's no baseline given that applies to all x86 CPUs (or more importantly all x86-64 CPUs). I think the actual shared baseline in practice for x86 (including pre-x86-64) is 1 byte, because of 8088.

So software that wants to run on modern x86-64 CPUs can't assume atomicity for 8-byte loads/stores unless they're actually aligned. I think we can all agree that atomicity guarantees are an essential part of being a modern multi-core x86 CPU. Atomicity of uncached MMIO access matters even on a single core; modern Intel and AMD agree on that, but again Intel only documents it in terms of "Pentium and later processors". Implicitly "later Intel processors".


That said, Intel's documentation does define mnemonics for every opcode, and register names. AMD's documentation agrees with Intel's on all of those things.

See volume 2 of Intel's x86 Software Development Manuals. HTML extracts of just the per-instruction manual entries (without the sections that explain the notation and instruction format) can be found at https://www.felixcloutier.com/x86/index.html and https://github.com/HJLebbink/asm-dude/wiki, and various other places have older versions formatted differently.


As @fuz explains, most assemblers choose to follow this standard, but it's not required. The important part is binary compatibility, not asm source compatibility.

Intel has to assign names to instructions so it can talk about them in English in the rest of its manuals, not because they need everyone in the world to use the same asm syntax.


I'm not sure Intel's manuals even fully defines a complete asm syntax (how to indicate segment-override prefixes in an addressing mode, for example).

In some cases they do step well beyond describing which machine code does what, e.g. in the string instructions lods/stos/movs/cmps/scas (and probably ins/outs), you'll find paragraphs like this one in Intel's vol.2 manual:

At the assembly-code level, two forms of this instruction are allowed: the “explicit-operands” form and the “no-operands” form. The explicit-operands form (specified with the MOVS mnemonic) allows the source and destination operands to be specified explicitly. Here, the source and destination operands should be symbols that indicate the size and location of the source value and the destination, respectively. This explicit-operands form is provided to allow documentation; however, note that the documentation provided by this form can be misleading. That is, the source and destination operand symbols must specify the correct type (size) of the operands (bytes, words, or doublewords), but they do not have to specify the correct location. The locations of the source and destination operands are always specified by the DS:(E)SI and ES:(E)DI registers, which must be loaded correctly before the move string instruction is executed.

(highlighting reproduced from (an HTML extract of) the original PDF)

Some "Intel-syntax" assemblers such as NASM ignore this, and only allow the use of movs with the size as part of the mnemonic, like movsb. NASM also has syntax for indicating a segment-override prefix like fs lodsd that doesn't require operands, so this entirely avoids the possibility of using operands that indicate the wrong memory operand but still assemble.

(The string instructions only use implicit memory operands, not a ModR/M addressing mode.)

NASM: parser: instruction expected rep movs

Convert Instruction in assembly code lods and stos so NASM can compile


So yes, there are multiple flavours of Intel-syntax assembly, not to mention very different syntaxes like AT&T.

AT&T uses different mnemonics intentionally for some instructions, even splitting up some opcodes that share a mnemonic in Intel syntax into separate mnemonics, like movzb for movzx-with-a-byte-source, and movzw for the word-source version. (Normally used with a size suffix as well, like movzbl, but the l can be inferred from 32-bit destination register if you like.)

And AT&T syntax unintentionally swaps fsubr with fsub when used with two register operands, which is a syntax design bug we're stuck with. (Fortunately x87 as a whole is mostly obsolete.)

Community
  • 1
  • 1
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847