Is MOV instruction actually doing more than just one atomic operation at CPU level?

Question

I'm new into assembly language and I understood that fetching data from memory can be done only through use of registers. Therefore:

    MOV eax, x // x is an integer
    MOV y, eax

The machine code of a MOV operation consist of the CPU instruction and the addresses of the operands and registers. True or false ?
In the RAM, the integer will be stored in 4 different memory locations. True or false ?

Considering an x86 32b processor architecture, when the CPU executes the MOV instruction to load the data (mov eax, x), in the case of the 4bytes data from RAM, it means that it must execute 4 instructions to get the data from all the 4 address that make up an integer of 4bytes, join the bytes together and put them into the EAX register.

Is this how the CPU does the job ? 
How does the CPU know how many bytes it should read ?

The assembler will use an [addressing mode](https://en.wikipedia.org/wiki/Addressing_mode) to encode the machine code operand for `x`. Assuming `x` is in the global data and it is 32-bit code, it will use an absolute address mode. (The opcode will imply a dword transfer, other sizes will either use an override prefix or change the opcode.) — Erik Eidt, Apr 09 '20 at 17:06
Are you asking about *how* the hardware memory interface (of which processor architecture/microarchitecture?) works, or *whether* the processor architecture (which processor architecture?) guarantees atomic loads (of which alignments/access sizes?). Please clarify. — EOF, Apr 09 '20 at 17:09
`mov` loads are only guaranteed atomic when the CPU can grab all the bytes from cache in a single wide operation, so atomicity happens for free. e.g. on Intel P6 and later, any qword or narrower load that doesn't cross a cache-line boundary. (Or from aligned uncacheable memory). See the linked duplicates. **Every byte has its own address, but that doesn't stop a single read from accessing multiple bytes, especially when the base address is aligned.** — Peter Cordes, Apr 09 '20 at 19:12

score -2 · Answer 1 · answered Apr 09 '20 at 17:11

-2

The data bus is 32 bits wide. The CPU puts the address requested on the address bus and the RAM sets the 32 bits on the data bus. Hence you can only access 32 bits if aligned to a 4 byte boundary, 16 bits aligned to a 2 byte boundary etc. Partly, it's the internal architecture of RAM chips but also, there's the risk of having to access two different chips to get the required bytes.

When reading a byte or a word, the CPU tends to set the unused areas of the data bus to low, hence moving with zero extension takes no more engineering than a regular move.

Then, of course, there's burst mode when rather than copy one piece of memory at a time, the CPU continues to copy it until it's told not to! This cuts down a lot of overhead decoding the instructions.

answered Apr 09 '20 at 17:11

Mike

2,721
1
15
20

2

The width of the data bus depends on what CPU is used. – fuz Apr 09 '20 at 17:14
1

The alignment requirements for memory accesses also depend on the CPU. The x86 chips, for the most part, don't care about alignment. Memory fetches tend to be entire cache lines these days. – 1201ProgramAlarm Apr 09 '20 at 17:40
SDRAM bursts are 64 bytes (or configurable to 32); the memory controller doesn't have to tell the DRAM chips to stop. https://en.wikipedia.org/wiki/Synchronous_dynamic_random-access_memory#Burst_ordering. It's not a coincidence that cache line size of most modern CPUs (including all modern x86) is 64 bytes. – Peter Cordes Apr 09 '20 at 19:04
See also [What Every Programmer Should Know About Memory?](https://www.akkadia.org/drepper/cpumemory.pdf) – Peter Cordes Apr 09 '20 at 19:11

Is MOV instruction actually doing more than just one atomic operation at CPU level?

1 Answers1