While you've gotten a number of good answers, I feel obliged to add one more point: even if the code is theoretically less efficient, it's rarely likely to make any real difference.
The reason is pretty simple: the CPU is a lot faster than memory in any case. Even relatively crappy code will still easily saturate the bandwidth between the CPU and memory. Even if the data involved is in the cache, the same generally remains true -- and (again) even with crappy code, the move is going to be done far too quickly to care anyway.
Quite a few CPUs (e.g., Intel x86) have a special path in the hardware that will be used for most moves in any case, so there will often be literally no difference in speed between implementations that appear quite a bit different even at the assembly code level.
Ultimately, if you care about the speed of moving things around in memory, you should worry more about eliminating that than making it faster.