In my quest to understand how structure padding works in C in the Linux x86 environment, I read that aligned access is faster than access that is mis-aligned. And while I think I understand the reasons given for that, they all seem to come with the underlying pre-supposition that the CPU can't directly access addresses that are not a multiple of the bus width, and so, for instance, if a 32-bit bus CPU was instructed to read 4 bytes of memory starting from address "2", it would first read 4 bytes starting from address "0", mask the first two bytes, read another 4 bytes starting from address "4", mask the last two bytes, and lastly combine the two results, as opposed to just being able to read the 4 bytes at once in case they were 4 bytes aligned.
So, my question is this: Why is that pre-supposition true? Why can't the CPU directly access addresses that are not a multiple of the bus width?