Why IA32 does not allow memory to memory mov?

Question

In Intel architecture IA32, instructions like movl, movw does not allow operands that are both memory locations. For example, instruction movl (%eax), (%edx) is not permitted. Why?

The ModR/M byte can't encode it. But then of course you can turn that into a "why did they make it so", well.. meanwhile, string move (`movsb`, `movsw`, `movsd`, `movsq`) has two memory arguments, but they're implicit. — harold, Aug 14 '12 at 13:42
It is 1976 and you can put 20,000 transistors on a chip to implement a 16-bit processor. That requires cutting corners heavily, the very non-orthogonal design was part of the outcome. And no room for finding the storage required to buffer the value between bus cycles. — Hans Passant, Aug 15 '12 at 00:11
I think a better explanation is that decoding insns with two full addressing-modes (`base + index + disp16`) would have required two AGU (address-generation-units), and would complicate the binary machine-code format a lot. (which segment override applies to which operand? How to allow encoding two memory addresses without bloating the code-size for the common case of one or both operands being registers?) — Peter Cordes, Apr 21 '16 at 03:45
@PeterCordes The original 8086 didn't even have one address generation unit, so that's not the reason. It did address calculations with the ALU. — Ross Ridge, Jul 17 '16 at 04:46

Dougvj · Accepted Answer · 2017-08-06T06:26:32.963

21

The answer involves a fuller understanding of RAM. Simply stated, RAM can only be in two states, read mode or write mode. If you wish to copy one byte in ram to another location, you must have a temporary storage area outside of RAM as you switch from read to write.

It is certainly possible for the architecture to have such a RAM to RAM instruction, but it would be a high level instruction that in microcode would translate to copying of data from RAM to a register then back to RAM. Alternatively, it could be possible to extend the RAM controller to have such a temporary register just for this copying of data, but it wouldnt provide much of a benefit for the added complexity of CPU/Hardware interaction.

EDIT: It is worth noting that recent advancements such as Hybrid Memory Cube and High Bandwidth Memory are achitectures in which the RAM topology has become more like PCI-e and direct RAM to RAM transfers are now possible, but that is due to the support logic for the technologies, not the RAM itself. In the CPU architecture, this would be in the form of huge blocks of RAM at a time, like DMA, and not in the form of a single instruction, plus the CPU cache behaves like traditional RAM so the architecture would have to abstract it as per my original explanation

EDIT2: Per @PeterCordes comment, my original understanding was not entirely correct; x86 does in fact have a few memory to memory instructions. The real reason they are not available for most instructions (such as movl and movw) is to keep instruction encoding complexity low, but they could have implemented them. However, the basic idea in my original answer, that there is a temporary storage location outside of RAM in the form of a latch or register, is correct, but the idea that this is the reason why these instructions don't exist is not. Even older chips from the 1970s such as the 6502 and the 8086 have memory to memory instructions, and you could easily perform operations such as INC directly on a RAM location. This was accomplished by latching the memory fetch directly to the ALU and back out to memory again without going through a register used by the instruction set.

edited Aug 06 '17 at 06:26

answered Aug 14 '12 at 17:28

Dougvj

6,409
2
23
18

7

In x86, it's an insn-encoding limitation, as well as keeping the decode complexity low. There *is* a mem-to-mem copy instruction (`movs`), but it uses two implicit operands. There's also `push/pop [mem]` to read from the effective address encoded in the insn and write to `[rsp]`. (or vice versa, push vs. pop). The original 8086 had these insns, and it had a pretty small transistor budget (but obvious large enough to latch 16 bits between read and write). In current CPUs, `rep movs` is really efficient for block-copies of large aligned buffers. – Peter Cordes Apr 21 '16 at 03:39
6

Also, all memory-destination instructions, like `inc byte [mem]` do a read-modify-write to memory. It's to the same address, but that's still two separate commands. This answer is not a bad guess, but it's pretty much unrelated to the right answer. – Peter Cordes Jul 17 '16 at 04:49
@PeterCordes Thank you for your input. I have learned quite a bit more since I have written this answer and realize I was not entirely correct. I have added an edit making note of this in the answer. – Dougvj Aug 06 '17 at 06:29
IA32 uses the same instruction set as 8086, just with some 32-bit extensions. `inc dword [eax]` exists in IA-32. (Or AT&T syntax, `incl (%eax)`. It sounds like your last paragraph is claiming that 6502 and 8086 have this but IA-32 doesn't. – Peter Cordes Mar 20 '18 at 22:47
@PeterCordes That wasn't my intention, but I am unsure how to make that clearer. Go ahead and suggest an edit. – Dougvj Mar 21 '18 at 02:37
Oh, on a 2nd read I don't think it implies that, nvm. This answer still needs a rewrite to start off with the correct reasons, rather than being wrong and then burying the right answer in an EDIT update. Your answer shouldn't be a changelog, it should just be the best answer you can make it. – Peter Cordes Mar 21 '18 at 03:18
@PeterCordes Makes sense. I have not been super familiar with the etiquette involved in such heavy rewrites, which is why I avoided it. Sometime when I have more time and interest I will do that. You should write your own answer surely it would be superior. – Dougvj Mar 21 '18 at 06:21
1

@Dougvj: turns out I *did* already write up an answer on [movl from memory to memory](https://stackoverflow.com/questions/33794169/movl-from-memory-to-memory), explaining the machine-encoding and ISA-design (and performance on modern CPUs) reasons for not having mem,mem instructions. That question is an exact dup; I could and maybe should repost it here. – Peter Cordes Mar 30 '18 at 01:23

score 7 · Answer 2 · answered Jul 17 '16 at 04:27

ia32 is x86, and x86 is evolution from the intel 8086 (iAPX 86). It was small and cheap chip based on 8-bit instruction sets, and had no "mov" with two explicit memory operands.

Wikipedia's author gives such explanation about instruction encoding of 8086:

Due to a compact encoding inspired by 8-bit processors, most instructions are one-address or two-address operations, which means that the result is stored in one of the operands. At most one of the operands can be in memory, but this memory operand can also be the destination, while the other operand, the source, can be either register or immediate. A single memory location can also often be used as both source and destination which, among other factors, further contributed to a code density comparable to (and often better than) most eight-bit machines at the time.

There were some CISCs with memory-memory instructions (single instruction to operate on two memory operands). The lecture https://www.cis.upenn.edu/~milom/cis501-Fall05/lectures/02_isa.pdf says that VAX can encode memory-memory instructions:

DEC VAX (Virtual Address eXtension to PDP-11): 1977

• Variable length instructions: 1-321 bytes!!!

• 14 GPRs + PC + stack-pointer + condition codes

• Data sizes: 8, 16, 32, 64, 128 bit, decimal, string

• Memory-memory instructions for all data sizes

• Special insns: crc, insque, polyf, and a cast of hundreds

This is OpenBSD memcpy source for VAX (instruction set manual http://h20565.www2.hpe.com/hpsc/doc/public/display?docId=emr_na-c04623178):

https://es.osdn.jp/projects/openbsd-octeon/scm/git/openbsd-octeon/blobs/master/src/sys/lib/libkern/arch/vax/memcpy.S

         movq    8(ap),r1        /* r1 = src, r2 = length */
         movl    4(ap),r3        /* r3 = dst */
... 
 1:      /* move forward */
         cmpl    r2,r0
         bgtru   3f              /* stupid movc3 limitation */
         movc3   r2,(r1),(r3)    /* move it all */

The "movc3" instruction here has two memory operands, which addresses are stored in registers.

x86 has several "string" instruction which will do memory-memory operations (*s, especially movs - http://x86.renejeschke.de/html/file_module_x86_id_203.html), but this instruction will use predefined registers SI & DI as addresses (implicit operands), and two memory operands still can't be encoded in x86.

Any idea how VAX machine code manages to stay compact when there's only one or zero memory operands to an instruction? That's the issue for x86, where the original 8086 had pretty simple instruction decoding. (That and potentially needing two AGUs, or two uses of one AGU). 8086 would have had to do something more complex than mod/rm byte, maybe with a variable-length encoding like 386's SIB byte. VAX apparently manages to support [scaled-index addressing modes (with or without increment/decrement)!](https://www.cs.auckland.ac.nz/references/macvax/indexed-address-mode.html) — Peter Cordes, Jul 17 '16 at 04:35
NVM, found it: "2.2. Addressing Modes The VAX-11 supports sixteen addressing modes. Each operand is represented in memory with an operand specifier, which consists of a mode byte followed by from zero to five additional bytes of information. The mode byte is broken into two fields: a four-bit mode specifier and a four-bit register designator." From (google's html cache of) https://users.cs.jmu.edu/abzugcx/Public/Student-Produced-Term-Projects/Computer-Organization-2004-SPRING/VAX-Architecture-by-William-French-Ahmed-Kareem-Horatiu-Stancu-Steve-Tran-2004-Spring.doc. — Peter Cordes, Jul 17 '16 at 04:37
Pretty bulky vs. 8086, esp. since I've heard that original 8086 hardware was essentially always bottlenecked on code-fetch. — Peter Cordes, Jul 17 '16 at 04:39
I'm writing an answer right now. IIRC, this question was previously closed as opinion based or something, otherwise I already would have posted one. While going over my previous comments on this post, I was starting to realize that I should just answer it myself now. — Peter Cordes, Jul 17 '16 at 04:56
Update, I had already written an answer about this almost a year before my last comment: [Why isn't movl from memory to memory allowed?](https://stackoverflow.com/q/33794169). Finally happened to notice this old comment and my other answer at the same time. — Peter Cordes, Oct 18 '18 at 07:37

score 3 · Answer 3 · answered Aug 14 '12 at 13:40

3

As far as I know, as a general rule in this architecture, only one memory access per instruction is allowed. This is because dealing with two memory accesses per instruction would complicate the processor's execution pipeline.

answered Aug 14 '12 at 13:40

whooot

415
3
9

There is such a rule, but it's about µops in Intel processors. – harold Aug 14 '12 at 13:51
3

SCAS\*, MOVS\*, PUSH/POP mem, PUSHA/POPA and some other instructions do access multiple "words" of memory. But their memory operands aren't all encoded using the Mod R/M byte, which can refer to at most just one memory operand. – Alexey Frunze Aug 14 '12 at 14:47

score 0 · Answer 4 · answered Aug 14 '12 at 13:41

0

RAM supports input and output, but not copying. Therefore a memory-to-memory move would actually be a memory-to-CPU-to-memory move. It would in theory be possible to implement such an instruction, but it probably wasn't because it wouldn't be very practical.

Here are some of the things that would need to be considered to implement such an instruction:

What temporary storage location do we use? A register?
If we use a register, which one do we hijack?

Not providing such an instruction leaves the above questions up to the programmer.

answered Aug 14 '12 at 13:41

Kendall Frey

43,130
20
110
148

3

But they did provide it, see `movsb` and its family. And of course the register to use doesn't have to be architectural. – harold Aug 14 '12 at 13:46
Adding to harold,@Kendall:- which register do you think call command hijacks?? and why something of this sort ,couldn't be used here??? – perilbrain Aug 14 '12 at 14:00

Why IA32 does not allow memory to memory mov?

4 Answers4

Linked

Related