Why is it not possible to read an unaligned word in one step?

Question

Given that the word size of a CPU allows it to address every single byte in the memory.
And given that via PAE CPUs can even use more bits than its word size for addressing.

What is the reason that a CPU cannot read an unaligned word in one step?

For example, in a 32-bit machine you can read the 4-byte chunk starting at position 0, but you cannot read the one starting at position 1 (you can but it needs several steps).
Why can CPUs not do that?

Possibly related: https://stackoverflow.com/questions/23593994/how-do-modern-cpus-handle-crosspage-unaligned-access — Yuval Adam, Jun 15 '14 at 09:35
@OliCharlesworth, I'm not sure you understood the question. It's very simple: what prevents the CPU from reading the 32-bit word at position 0x00000001 in one step — GetFree, Jun 15 '14 at 09:36
@GetFree: Because *memory* is divided into words. Reading misaligned requires reading *two* memory words, and then stitching things together. (Note: this is architecture-specific.) — Oliver Charlesworth, Jun 15 '14 at 09:37

Pepito · Answer 1 · 2014-06-23T19:51:22.723

5

The problem is not with the ability of the CPU to address any single byte in the memory. But it is the memory that has not the same granularity. Like Oli said, this is very architecture-specific, but memory chips are often addressed by their data bus wideness. Meaning that a given address represents a full "word" of their data bus.

Let's take the example of a 32 bits CPU, with a 32 bits-wide data bus connected to a memory device. When the CPU wants to access to the word at address 0x00000000, it really wants to access to the bytes 0, 1, 2 and 3. For the memory chip however, this is represented by the single address 0x00000000.

Now when the CPU wants to access to the word at address 0x00000001, it really wants to access to the bytes 1, 2, 3 and 4. For the memory chips however, this is represented by a piece of the word at address 0x00000000 and a piece of the word at address 0x00000001.

Hence the need for two bus cycles.

EDIT: Adding some wiring illustration

To illustrate this, here are both addressing scheme opposed:

RAM_CPU_Bus

Notice the bit shift in the addresses of the RAM chip.

Addresses will look like this:

// From the RAM point of view
@0x00000000: Bytes 0x00000000 to 0x00000003
@0x00000001: Bytes 0x00000004 to 0x00000007

To access to the dword @0x00000001, you can see that no direct addressing is possible. You need to ask the RAM chip for both dwords at addresses 0x00000000 and 0x00000001.

edited Jun 23 '14 at 19:51

answered Jun 16 '14 at 21:28

Pepito

131
4

If the limitation is in the memory chips themselves, does it mean that a memory chip from a 64-bit computer cannot be used in a 32-bit computer? – GetFree Jun 16 '14 at 22:41
What matters is not really the number of bits of the processor, but the number of bits of its data bus vs the memory data bus. They used to be the same, but modern computing brought a lot of very wide data-bus processors and memories. To answer your question: if you plug a 64 bits-wide memory on a 32 bits-wide processor data bus then you'll waste half the size of the memory (but it should work on the paper). – Pepito Jun 18 '14 at 06:32
Still not convinced ? I added a sample schematic to illustrate my words. – Pepito Jul 02 '14 at 19:31
So the reason is that the first 2 bits are discarded? – GetFree Jul 02 '14 at 22:06
More than just discarded, the address bits are shifted. – Pepito Jul 03 '14 at 16:29
Can I have the source of that image please. – GetFree Jul 04 '14 at 02:06
The source is my humble experience as Hw Engineer ;) – Pepito Jul 04 '14 at 20:34
Do they really reduce the manufacturing costs by not using the first 2 bits? or is there another reason why they do it that way? – GetFree Jul 04 '14 at 23:39

score 1 · Answer 2 · answered Jun 22 '14 at 22:12

The simple answer is they can't because they are designed not to.

The main reason that they are designed this way is for performance and scalability. We would lose way too many incredibly important features to support this.

A simple analogy, the humble Shipping Container. Before the days of the shipping container, freight of many different shapes and sizes were packed as efficiently as possible into the hulls of ships. Because of the infinitely variable sizes of the freight, ranging from crates, to bags of coffee, to bales of hay and cotton, the capacity of these ships was horribly and inefficiently utilized.

The shipping container changed all of that, now if you want to ship something internationally it must be in a standard-sized shipping container. It isn't that you can't just ship your bag of cat food to your friend in Hong Kong on a container ship, it's that it is just so incredibly inefficient to do that it just isn't done.

You want to get that cat food to your friend quickly, without buying a whole shipping container? Well, you can pay an express shipping company like FedEx to fly it over on a 747, but you sure as hell are going to pay for that ability.

What's the design component that prevents the CPU from loading into a registry any given 4 bytes? (in one step, that is) If it _can_ address all bytes in memory, what if it puts into the address registry the address 0x00000002? What specific aspect of the design makes that address not valid to use? Is it the address bus that doesn't transfer all the bits? Is it the memory mapping cache that doesn't work with unaligned addresses? is it some limitation in the memory chip itself? all of the above? — GetFree, Jun 23 '14 at 01:29

Why is it not possible to read an unaligned word in one step?

2 Answers2

Linked