If you think about how the machine is wired, it all makes sense.
Occasionally, people have tried to change this (Rambus, FBDIMM) but we keep coming back to wiring each bit in the DRAM array to it's identical bit on the CPU bus.
In the earlier days of computers, it was quite expensive to shift bits on the memory data bus to correct for misaligned accesses. Some machines didn't allow it all; the ones that did added a speed penalty. Some, like the original, at-the-time-super-fast and first-good-
64-bit-micro DEC-Alpha, actually did correct it but at the expense of a software trap!
The IA32 and x64 architectures have always fixed it up transparently, and with zillions of transistors on each chip they have the barrel shifters and other dedicated hardware to easily patch up misaligned references.
But, it still may interrupt the pipeline, it may take some sort of micro-trap; it isn't the "natural way".
The exact penalty is specific to the microarchitecture of the chip you are using. Portable code should assume that misaligned accesses are penalized. Some embedded CPU chips (Some Arm) actually don't error out but just do the wrong thing! I'm sincerely hoping that all of those are out-of-production.