When the CPU accesses memory to fetch a data item (or struct member), it actually sends a request to the memory controller, which performs some trickery to make the DRAM appear to be a well-structured data store. In reality, DRAM is a bunch of cells, arranged in M rows of N bits each (where N could be a thousand or so).
Knowing that most architectures process 4, 8, 16 or 32 (or sometimes larger) bits at a time, memory controllers are optimized for fetches from addresses that are multiples of 4. What happens when you fetch a single byte from the address abcd1002? Well, the memory controller fetches four bytes from the address abcd1000, then shifts them to obtain the third byte (remember, it's 0, 1, then 2), and gives you your lousy non-aligned byte. Thus, fetching from an aligned address is always faster than from a non-aligned one.
Aware of this fact, compilers aggressively optimize for speed by padding data structures so that they're laid out in a memory-friendly way.
Hope that provides an important computer architecture perspective on this issue. I did not see this mentioned in any of the current responses, hence I felt like adding my $0.02.