Data Alignment: Reason for restriction on memory address being multiple of data type size

Question

I understand the general concept of data alignment, but what I do not understand is the restriction on memory address value, forced to be a multiple of the size of underlying data type.

This answer explains the data alignment well.

Quote:

Let's look at a memory map:
+----+
|0000| 
|0001|
+----+
|0002|
|0003|
+----+
|0004|
|0005|
+----+
| .. |
At each address there is a byte which can be accessed individually. But words can only be fetched at even addresses. So if we read a word at 0000, we read the bytes at 0000 and 0001. But if we want to read the word at position 0001, we need two read accesses. First 0000,0001 and then 0002,0003 and we only keep 0001,0002.

Question:

Assuming it's true, why " But words can only be fetched at even addresses. " be true? Can't the memory/stack pointer point to 0001 in the example and then read a word of information starting there?

We know the machine can read memory in blocks of 2 bytes with one read action, (in the example [0000, 0001] or [0002, 0003]). So if my address register is pointing to 0001 (odd address instead of even), then I can read 2 bytes from there (i.e. 0001 and 0002) directly in one read action, right?

Does [Purpose of memory alignment](https://stackoverflow.com/questions/381244/purpose-of-memory-alignment) and its answers help? — user4581301, Aug 07 '19 at 23:18
@user4581301 I did have a look at this. It does outline the issue but didn't address the memory address multiplicity restriction. Thanks — User 10482, Aug 07 '19 at 23:21
Think of the performance problem covered in the first answer's section on speed: If the data is not aligned, multiple reads may be required. If the hardware is lined up to read 16 bit chunks, for unaligned data the CPU may have to read 16 bits, keep the upper 8, read 16 bits, keep the lower 8, assemble and place in register. This sucks. Many CPUs nip this problem in the bud by rejecting it. Those that allow it accept the performance hit — user4581301, Aug 07 '19 at 23:30
@user4581301 Yep, I understand what you are trying to say. The answer linked says similar. But, my question is about different issue; the issue of address itself, not the data pointed to, to be a multiple of data size. If that were not the case, I could simply move the address and extract the chunk of memory holding my data-of-appropriate-size in one read. (as I said in the latter part of the question) — User 10482, Aug 07 '19 at 23:39
@User10482 I don;t understand what distinction you are drawing between the answers you've found and been given and the question you are asking, which you seem to think is different, but isn't. — user207421, Aug 08 '19 at 00:47
@user207421 Explanation by immibis in the comments [here](https://stackoverflow.com/a/57403564/11205473) and answer by Tanque [here](https://stackoverflow.com/a/57403499/11205473) cleared the confusion that I specifically had. — User 10482, Aug 08 '19 at 00:59

Skrino · Answer 1 · 2019-08-07T23:47:28.113

The assumption about that statement is not necessarily true. I don't want to re-iterate the answer you linked to describing the reasons for using and highly preferring aligned access, but there are architectures that do support unaligned memory access -- ARM for example (check out this SO answer).

But your question, I think, really comes down to hardware architecture, specifically the data bus design, and the accompanying instructions set that engineers at various silicon manufacturers have designed.

Some Cortex-M cores explicitly allow you to enable a CPU to trigger an exception on un- aligned access by configuring a Usage Fault register, which means that you can "utilize" unaligned memory access in rare use-cases.

Except in systems with very unusual hardware you cannot do an unaligned access with a single access. ARM allows unaligned access, but it comes at a cost and with caveats. — user4581301, Aug 08 '19 at 00:23

Tanque · Accepted Answer · 2019-08-08T00:41:05.900

Usually a processors internal addresses points to a whole word. This is because you don't want your (simple) processor be able to address a word at a random byte (or even worse: bit) because

You waste addressable memory: presuming the biggest possible address your processor is able to process is the max value of its word size, and you can multiply that by the size of your word to calculate the amount of storage you can address. (Each unique address points to a full word) The "address" I'm talking about here is not necessarily looking like the address which might be stored in a pointer of a higher programming language. The pointer address addresses each byte, which will be interpreted by a compiler or interpreter into the corresponding assembly instructions (discarding unwanted bytes from the loaded word)
A word loaded from memory could be anything, a value or the next instruction of the program you are running on your processor - the previous word loaded into the processor will often give an indication what the following word that gets loaded is used for: another instruction (eg arithmetic operation, load or store instruction) which might be followed by operands (values or addresses). Being able to address unaligned words would complicate a processor a lot in easy words.

Regarding 1. One can still retrieve any byte from the loaded word via specific assembly instructions: `loadLeastSignificantByte $addressOfWord` etc. — Tanque, Aug 07 '19 at 23:55

score 0 · Answer 3 · answered Aug 07 '19 at 23:55

0

Assuming it's true, why " But words can only be fetched at even addresses. " be true?

The memory actually stores words. The processor actually addresses the memory in words, and fetches a word at a time.

When you fetch a byte, it actually fetches a word, then ignores either the first half or the second half.

On 32-bit processors, it fetches a 32-bit word, then ignores three quarters; fetching a 16-bit word on a 32-bit processor ignores half the word.

If the 16-bit word you want to fetch (on a 16-bit processor) isn't aligned, then the processor has to fetch two words, take half of each word and then re-combine them. So even on processor designs where it works, it's often slower.

A lot of processor designs don't bother - either they just won't allow it, or they force the operating system to handle it (which is very slow).

(Not all types of processors work this way - e.g. 8-bit processors usually fetch a byte at a time)

Can't the memory/stack pointer point to 0001 in the example and then read a word of information starting there?

If the processor supports it, yes.

answered Aug 07 '19 at 23:55

user253751

57,427
7
48
90

Okay, so it's some kind of hardware limitation on the address pointer. Applying to the example, it means I can point to 0000, 0002, 0004.... but not 0001, 0003 etc, right?) But, it seems so easy to implement one that can point anywhere. Just store the odd address in stack pointer register or any address register and read memory pointed by it. (in whatever size chunk it forces (2 bytes in the example) – User 10482 Aug 08 '19 at 00:15
3

@User10482 The pointer can point anywhere. But pretend you're a CPU connected to a 16-bit memory system (so the memory sees address 0000 as a 16-bit word, and address 0001 as a different 16-bit word, they don't overlap). When the program asks for 16 bits at 0000 you request 0000 from the memory. When the program asks for 16 bits at 0002 you request 0001 from the memory. When the program asks for 8 bits at 0003 you request 0001 from the memory and throw away half. What do you do when the program asks for 16 bits at 0003? – user253751 Aug 08 '19 at 00:18
_(so the memory sees address 0000 as a 16-bit word, and address 0001 as a different 16-bit word, **they don't overlap**)_ that makes so much sense now. The program is seeing the memory as uniquely addressed 1 byte sized pieces, but the processor is seeing it as sequentially numbered chunks of 2 bytes. Now I see the problem! Thanks. – User 10482 Aug 08 '19 at 00:40
2

@User10482 And it can get a bit crazier than that. Thanks to virtual memory the addresses you are seeing in a program could be radically different from how the programs memory is actually laid out in storage. – user4581301 Aug 08 '19 at 00:48
1

and of course the reason they design the memory system that way is because fetching 2 chunks takes twice as long as fetching 1 chunk, no matter how big the chunks are. So they want to have really big chunks. (nowadays, bigger than 16 bits) – user253751 Aug 08 '19 at 00:54

Data Alignment: Reason for restriction on memory address being multiple of data type size

3 Answers3

Linked