good people of the Internet!
A past couple of days I've been reading about how CPU access memory and how it could be slower then desired if the accessed object is spread over different chunks that CPU accesses.
In a very generalized and abstract words, if I, say, have an address space from 0x0 to 0xF with a cell of one byte, and CPU reads memory in chunks of 4 bytes (that is, has a quad byte memory access granularity), then, if I need to read an object of 4 bytes size residing in cells 0x0 - 0x3, CPU would do it in one operation, while if the same object occupies cells 0x1 - 0x4, then CPU needs to perform two read operations (read memory in 0x0 - 0x3 first, then in 0x4 - 0x7), shift bytes and combine two parts (or break, if it cannot do unaligned access). This happens, once again, because CPU can read memory in 4 bytes chunks (in our abstract case). Let's also assume, that CPU make these reads inside one cache line and there is no need to change the contents of cache between reads.
So, in this case, the beginning of each chunk CPU can read is residing in a memory cell that has an address which is multiple of 4 (right?). Ok, i don't have any questions about why CPU reads in chunks, but why exactly the beginning of each chunk is aligned in such a way?
If referring to an example in a previous paragraph, why exactly CPU cannot read a chunk of 4 bytes starting from 0x1?
As I may understand, CPU is pretty much aware that 0x1 exists. So is all the fuzz happening because memory controller cannot access chunk of memory starting from 0x1? Or is it because a couple of LSBs in a processor word are reserved on some architectures? Or the fact that they are reserved is the consequence of an aligned access, an not its cause (it seems like it's a second question already, but I would leave it as at the time I write this question I have a feeling that they are related)?
There are a bunch of answers here touching this topic (like this and this) and articles online (like this and this), but in all the resources there are good explanations on the phenomena itself and its consequences, but no explanation on why exactly CPU cannot read a chunk of memory starting "in between" byte boundaries (or I couldn't see it maybe).