What x86 instructions take two (or more) memory operands?

Question

I thought that there was zero. But, I see here,

Instructions with two memory operands are extremely rare

I can't find anything that explains what instructions, though rare, exist. What are the exceptions?

The only ones I remember are the `movsb`, `movsw`, and `movsd` string instructions. Their memory operands are implicit though. — fuz, Sep 30 '18 at 00:22
Don't know why, but the link to nasmtutorial seems to be to an empty page, currently. — Rudy Velthuis, Sep 30 '18 at 03:01
@prl: Oh cool, that's a neat instruction. Updated my answer. Atomicity wider than 8 bytes is interesting (first time an x86 vendor has guaranteed anything other than 8 bytes and `lock cmpxchg16b`?) In practice SKX probably has 64-byte load/store atomicity, but there's no clean way to take advantage for larger lock-free atomic objects because there's no guaranteed way to detect it. So too bad `movdir64b` is only available in a cache-bypassing version that probably hurts performance for sharing data between threads. — Peter Cordes, Sep 30 '18 at 12:49
*"explains the rarity"* simplicity of HW design,the memory was connected to CPU by address bus and data bus, and at single moment you could read/write only single value from/to it, by setting the address bus lines to contain the desired address of value, and data bus to value to be written (or wait for the data bus to be set by memory chip to the value read). So any two-memory operands are(were) implemented sequentially. Nowadays the CPU is lot more complex and memory access is layered under cache systems, but the one-mem-operand design seems powerful enough to not bother to change this way. — Ped7g, Sep 30 '18 at 20:32
@Ped7g that's not what I was going for but I can see that interpretation will fix it. — Evan Carroll, Sep 30 '18 at 20:37

score 24 · Accepted Answer · edited Jul 20 '19 at 18:13

I can't find anything that explains the rarity.

An x86 instruction can have at most one ModR/M + SIB + disp0/8/32. So there are zero instructions with two explicit memory operands.

The x86 memory-memory instructions all have at least one implicit memory operand whose location is baked in to the opcode, like push which accesses the stack, or the string instructions movs and cmps.

What are the exceptions?

I'll use [mem] to indicate a ModR/M addressing mode which can be [rdi], [RIP+whatever], [ebx+eax*4+1234], or whatever you like.

push [mem]: reads [mem], writes implicit [rsp] (after updating rsp).
pop [mem]
call [mem]: reads a new RIP from [mem], pushes a return address on the stack.
movsb/w/d/q: reads DS:(E)SI, writes ES:(E)DI (or in 64-bit mode RSI and RDI). Both are implicit; only the DS segment reg is overridable. Usable with rep.
cmpsb/w/d/q: reads DS:(E)SI and ES:(E)DI (or in 64-bit mode RSI and RDI). Both are implicit; only the DS segment reg is overridable. Usable with repe / repne.
MPX bndstx mib, bnd: "Store the bounds in bnd and the pointer value in the index register of mib to a bound table entry (BTE) with address translation using the base of mib." The Operation section shows a load and a store, but I don't know enough about MPX to grok it.
movdir64b r16/r32/r64, m512. Has its own feature bit, available in upcoming Tremont (successor to Goldmont Plus Atom). Moves 64-bytes as direct-store (WC) with 64-byte write atomicity from source memory address to destination memory address. Destination operand is (aligned atomic) es: /r from ModRM, source is (unaligned non-atomic) the /m from ModRM.

Uses write-combining for the store, see the description. It's the first time any x86 CPU vendor has guaranteed atomicity wider than 8 bytes outside of lock cmpxchg16b. But unfortunately it's not actually great for multithreading because it forces NT-like cache eviction/bypass behaviour, so other cores will have to read it from DRAM instead of a shared outer cache.

AVX2 gather and AVX512 scatter instructions are debatable. They obviously do multiple loads / stores, but all the pointers come from one SIMD vector (and a scalar base).

I'm not counting instructions like pusha, fldenv, xsaveopt, iret, or enter with nesting level > 1 that do multiple stores or loads to a contiguous block.

I'm also not counting the ins / outs string instructions, because they copy memory to/from I/O space. I/O space isn't memory.

I didn't look at VMX or SGX instructions on http://felixcloutier.com/x86/index.html, just the main list. I don't think I missed any, but I certainly could have.

Does any instruction has 4 explicit operands? Because this (https://www.felixcloutier.com/x86/index.html) website reserves 4 columns in operands table for all instructions. — Sourav Kannantha B, Jul 18 '21 at 13:27
@SouravKannanthaB: Yes, `vpblendvb` / `vblendvps/pd`, but still only one can be memory. [What kind of address instruction does the x86 cpu have?](https://stackoverflow.com/q/53325275). Note that https://www.felixcloutier.com/x86/ is just scraped from Intel's own PDF manual. — Peter Cordes, Jul 18 '21 at 16:17

What x86 instructions take two (or more) memory operands?

1 Answers1

Linked

Related