TL;DR: Indeed shifts by multiple steps would generally be done by multiple shifts as you can imagine. But some tricks can be used to avoid shifting too many times. For example some algorithms are designed so that only shifts by 1 is needed, or if a bigger shift is required then some special bitwise instructions in the ISA can be used for optimization
Programming for an embedded system requires deeper understanding about that architecture in order to achieve good performance and RAM/ROM usage. For example variables are chosen so that they fit in the machine word. No one would use uint64_t
like that on a 16 or 8-bit MCU unless absolutely necessary. Operations on multi-word variables (include bit shifts) require much more instructions so they won't usually be inlined. In general for shifts that are a multiple of a word size are done similarly to shifting array elements so they'll be very fast
Shifting by an arbitrary variable requires a barrel shifter which takes far more die space than a simple shift register, therefore most embedded architectures can only shift one bit at a time. Most of them also include some kind of swap halfword instruction to overcome the limitation and allow fast enough large-distance shifting. For example 8-bit microcontrollers like 8051, PIC or AVR has a swap-nibble instruction. See:
The MSP430 is a 16-bit MCU so it has a SWPB
instruction to swap bytes which can be used similarly for fast shifting by 8. Here are some examples generated by Clang (with comments by me, notice how the shift by 8 and shifts larger than 8 are done):
shift_left_15(unsigned short): ; @shift_left_15(unsigned short)
mov.b r12, r12
swpb r12 ; swap bytes then shift left 7 times
add r12, r12
add r12, r12
add r12, r12
add r12, r12
add r12, r12
add r12, r12
add r12, r12
ret
shift_left_12(unsigned short): ; @shift_left_12(unsigned short)
mov.b r12, r12
swpb r12 ; swap bytes then shift left 4 times
add r12, r12
add r12, r12
add r12, r12
add r12, r12
ret
shift_left_10(unsigned short): ; @shift_left_10(unsigned short)
mov.b r12, r12
swpb r12 ; swap bytes then shift left 2 times
add r12, r12
add r12, r12
ret
shift_left_9(unsigned short): ; @shift_left_9(unsigned short)
mov.b r12, r12
swpb r12
add r12, r12
ret
shift_left_8(unsigned short): ; @shift_left_8(unsigned short)
mov.b r12, r12
swpb r12 ; just swap bytes
ret
shift_left_7(unsigned short): ; @shift_left_7(unsigned short)
add r12, r12
add r12, r12
add r12, r12
add r12, r12
add r12, r12
add r12, r12
add r12, r12
ret
shift_left_3(unsigned short): ; @shift_left_3(unsigned short)
add r12, r12
add r12, r12
add r12, r12
ret
You can open the Godbolt link above for the full output
If you're on MSP430X then it has the ability to shift from 1 to 4 bit position which greatly simplifies the shifting procedure
shift_left_15(unsigned short):
PUSHM.W #1, R4
MOV.W R1, R4
rpt #15 { rlax.w R12
POPM.W #1, r4
RET
shift_left_12(unsigned short):
PUSHM.W #1, R4
MOV.W R1, R4
rpt #12 { rlax.w R12
POPM.W #1, r4
RET
shift_left_10(unsigned short):
PUSHM.W #1, R4
MOV.W R1, R4
rpt #10 { rlax.w R12
POPM.W #1, r4
RET
shift_left_9(unsigned short):
PUSHM.W #1, R4
MOV.W R1, R4
rpt #9 { rlax.w R12
POPM.W #1, r4
RET
shift_left_8(unsigned short):
PUSHM.W #1, R4
MOV.W R1, R4
rpt #8 { rlax.w R12
POPM.W #1, r4
RET
shift_left_7(unsigned short):
PUSHM.W #1, R4
MOV.W R1, R4
rpt #7 { rlax.w R12
POPM.W #1, r4
RET
shift_left_3(unsigned short):
PUSHM.W #1, R4
MOV.W R1, R4
rpt #3 { rlax.w R12
POPM.W #1, r4
RET
Right shift can be done in the same way but replace add
with rrc/rra
and rlax
with rrax
. See demo on Godbolt