The problem is not with the ability of the CPU to address any single byte in the memory. But it is the memory that has not the same granularity.
Like Oli said, this is very architecture-specific, but memory chips are often addressed by their data bus wideness. Meaning that a given address represents a full "word" of their data bus.
Let's take the example of a 32 bits CPU, with a 32 bits-wide data bus connected to a memory device. When the CPU wants to access to the word at address 0x00000000
, it really wants to access to the bytes 0
, 1
, 2
and 3
. For the memory chip however, this is represented by the single address 0x00000000
.
Now when the CPU wants to access to the word at address 0x00000001
, it really wants to access to the bytes 1
, 2
, 3
and 4
. For the memory chips however, this is represented by a piece of the word at address 0x00000000
and a piece of the word at address 0x00000001
.
Hence the need for two bus cycles.
EDIT: Adding some wiring illustration
To illustrate this, here are both addressing scheme opposed:

Notice the bit shift in the addresses of the RAM chip.
Addresses will look like this:
// From the RAM point of view
@0x00000000: Bytes 0x00000000 to 0x00000003
@0x00000001: Bytes 0x00000004 to 0x00000007
To access to the dword @0x00000001
, you can see that no direct addressing is possible. You need to ask the RAM chip for both dwords at addresses 0x00000000
and 0x00000001
.