Hand assembly can always at least match if not beat the compiler, because at the very least, you can start with the compiler generated assembly code and tweak it to make it better. To really do a good job, you need to understand the CPU architecture (pipeline, functional units, memory hierarchy, out-of-order dispatch units, etc.) so that you can schedule each instruction for maximum efficiency.
Another thing to consider is that the number of instructions is not necessarily directly proportional to performance, whether it is speed or power (see Hennessey and Patterson's Computer Architecture: A Quantitative Approach). Basically, you have to look at how many clock cycles each instruction takes, in addition to the number of instructions (and clock rate) to know how long it will take. To know how much energy will be consumed, you also need to know how much energy each instruction takes.
How the CPU implements each instruction affects how many cycles it takes to execute. As an example, your code sequence has a >>
operator. The compiler might translate that to a single ASR
instruction, but without knowing the architecture, there is no telling how many clock cycles it might take -- some architectures can do an arbitrary shift in a single cycle, while others need one cycle for each bit shift.
Memory access contributes to the number of cycles and power consumption, too. When there are too many variables to store in registers some of them will have to be stored in memory. If you are accessing off chip memory and have a fairly high CPU clock rate, the memory bus can be pretty power hungry. A longer sequence of instructions that avoids reading from and writing to memory (e.g., by computing the same result twice) can be less expensive.
As several others have suggested, there is no substitute for benchmarking. Assuming you are using a microcontroller-based system with a constant input voltage, your best bet is to measure the current draw of your system with each alternative set of code and see which does best (one way would be with a current probe and a digital storage oscilloscope).
Even if you can always write better assembler than the compiler, there is a cost in development time and maintainability. In The Mythical Man Month Brooks estimated 3-5x more effort at time when many, if not most, programmers wrote code in assembler. Unless your code is really tiny, you are probably best off only coding the most critical parts in assembly. Even so, the person writing the assembly should be able to prove that their (more expensive) code is worth the cost by comparing running code vs. running code.