1

Sites like https://uops.info/ and Agner Fog's instruction tables, and even Intel's own manuals, list various forms of the same instruction. For example add m, r (in Agner's tables) or add (m64, r64) on uops.info, or ADD r/m64, r64 in Intel's manual (https://www.felixcloutier.com/x86/add).


Here's a simple example I ran on godbolt

__thread int a;
void Test() {
    a+=5;
}

The add is add DWORD PTR fs:0xfffffffffffffffc,0x5. It starts with the opcodes 64 83 04 25.

There's a few ways to write my real code but I wanted to lookup how many cycles this might take and other information. How the heck do I find the reference to this instruction? I tried https://uops.info/table.html typing in "add" and checking off my architecture. But I have no idea which one of the entries is the instruction that's being used.

For now in this specific case I'm guessing the opcode is Add m64, r64 but I have no idea if there's any penalty for using fs: before the address or if there's a way to see opcodes so I can confirm I'm looking at the right reference

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Eric Stotch
  • 141
  • 4
  • 19

2 Answers2

6

http://ref.x86asm.net/coder64.html has an opcode map, but with a bit of experience you won't need one most of the time. Especially when you have disassembly, you can just check the manual entry for that mnemonic (https://www.felixcloutier.com/x86/add), and see which of the possible opcodes it is (83 /0 add r/m32, imm8).

Clearly this has a 32-bit operand-size (dword ptr) memory destination, and the source is an immediate (numeric constant). That rules out a , r64 register source for 2 separate reasons. So even without looking at the machine code, it's definitely add r/m32, imm with an imm8 or imm32. Any sane assembler will of course pick imm8 for a small constant that fits in a signed 8-bit integer.

Generally different ways of encoding the same instruction aren't special, so the source-level assembly / disassembly is fine, as long as you understand what's a register, what's memory, and what's an immediate.

But there are a few special cases, e.g. Agner Fog's guide notes that rotates by 1 using the short-form encoding are slower than rol reg, imm8 even when the imm8=1, because the flag-updating special case for rotate-by-1 actually depends on the opcode, not the immediate count. (Intel's documentation apparently assumes your assembler will always pick the short-form for rotate by constant 1. The part about "masked count" may only apply to rotate by cl. https://www.felixcloutier.com/x86/rcl:rcr:rol:ror#flags-affected. I haven't tested this recently and am not 100% sure I'm remembering correctly when OF is updated (but other flags in the SPAZO group are always left unmodified), but IIRC that's why rotates by 1 (2 uops) and by cl (3 uops) are slow, vs. rotates by other immediate counts (1 uop) on Intel).

Or https://github.com/travisdowns/uarch-bench/wiki/Intel-Performance-Quirks. Specifically I mean Which Intel microarchitecture introduced the ADC reg,0 single-uop special case? - even on Haswell / Skylake, adc al,0 (using the short form with no modrm byte) is 2 uops, and so is the equivalent adc eax, 12345. But adc edx, 12345 is 1 uop using the non-special case.) Then you have to either check the machine code, or know how your assembler will have chosen to encode a given instruction. (Optimizing for size).


BTW, using a segment with a non-zero base adds 1 cycle of latency to address-generation, IIRC, but aren't a significant throughput penalty. (Unless of course throughput bottlenecks on a latency chain that it's part of...)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • I'm not sure why but I have the bad habit of thinking r = immediate. As if the R stands opeRand. This was very helpful thank you. – Eric Stotch Dec 14 '20 at 00:32
4

Look at the Intel manual for x86 CPU Its about 6000 pages long i'm sure its there lol https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1-2abcd-3abcd.pdf

Also check this out this site http://ref.x86asm.net/coder64.html size just search for 64 (it shows as greyed out the opcode), as you can see 64 has nothing to do with ADD opcode its just a FS:[] segment override prefix, and 83 is the ADD Opcode

fs prefix
add opcode

Here is how your opcode works like I simulated it in IDA disassembler. see the bytes


looks like this in ASM
asm

SSpoke
  • 5,656
  • 10
  • 72
  • 124
  • 1
    I gotta say this is as helpful as the other answer. Stackoverflow please let me accept two answers – Eric Stotch Dec 14 '20 at 00:33
  • 2
    I wouldn't recommend the huge all-in-1 PDF, it's too big. vol.2 as its own PDF is usable and has an index. (https://software.intel.com/content/www/us/en/develop/articles/intel-sdm.html has links). But yeah, Intel's vol.2 manual is what online HTML sites scrape to extract insn set references like https://www.felixcloutier.com/x86/, but the original PDF has intro material about how to read the entries, and appendices containing an opcode map. Normally you don't need those, just the instruction references, which is why scrapes don't include them, but when you do Intel has them. – Peter Cordes Dec 14 '20 at 15:52