Indeed ATMega doesn't have a barrel shifter just like most (if not all) other 8-bit MCUs. Therefore it can only shift by 1 each time instead of any arbitrary values like more powerful CPUs. As a result shifting by 4 is theoretically slower than shifting by 3
However ATMega does have a swap nibble instruction so in fact x >> 4
is faster than x >> 3
Assuming x
is an uint8_t
then x >>= 3
is implemented by 3 right shifts
x >>= 1;
x >>= 1;
x >>= 1;
whereas x >>= 4
only need a swap and a bit clear
swap(x); // swap the top and bottom nibbles AB <-> BA
x &= 0x0f;
or
x &= 0xf0;
swap(x);
For bigger cross-register shifts there are also various ways to optimize it
With a uint16_t
variable y
consisting of the low part y0
and high part y1
then y >> 8
is simply
y0 = y1;
y1 = 0;
Similarly y >> 9
can be optimized to
y0 = y1 >> 1;
y1 = 0;
and hence is even faster than a shift by 3 on a char
In conclusion, the shift time varies depending on the shift distance, but it's not necessarily slower for longer or non-power-of-2 values. Generally it'll take at most 3 instructions to shift within an 8-bit char
Here are some demos from compiler explorer
A right shift by 4 is achieved by a swap
and an and
like above
swap r24
andi r24,lo8(15)
A right shift by 3 has to be done with 3 instructions
lsr r24
lsr r24
lsr r24
Left shifts are also optimized in the same manner
See also Which is faster: x<<1 or x<<10?