On x86 (and most architectures that have this instruction), the extra bit is the carry flag, and lots of things can set that flag. Rotate-through-carry left or right lets you shift the carry bit back into some other register. Interesting that m68k uses a different flag for extended-rotate.
I'm not very familiar with m68k anymore, so I'll mostly talk about other arches. (But apparently that's what you want :)
Instructions like this are often available on microcontrollers that are much less capable than x86 or m68k. Or with limited opcode space (and decoding complexity), some CPUs only have a rotate-through-carry by 1 and not regular shift instructions. If you want to shift in a zero, make sure the flag is clear first.
8051 is like this: only rotate left/right by 1, and rotate-with-carry left/right by 1, not shift. See rlc
in the ISA ref manual. When possibly, avoid the clr
instruction when you want to shift in a zero by putting rlc
after something else that leaves Carry cleared.
I think it's typical for extended circular shift to use the carry flag, rather than its own X bit the way m68k does.
Anyway, extended-precision rotate is kind of traditional / expected for CPUs, but has more uses on more limited CPUs.
For a register, rcl reg, 1
is the same operation as adc reg,reg
: shift the old contents left by 1, and set the low bit to CF. The bit shifted out by the rotate or adc becomes the new value of CF. So RCL is only a non-redundant part of an instruction set if it's available with a memory operand, or (for weird cases) with a count greater than 1. (The rotate-right version is not redundant, though.)
IDK why you'd ever use a count > 1. On modern x86, rotate-through-carry is fairly fast with count=1, but definitely slow with a variable count or fixed count>1. IDK which way the chicken/egg problem went: CPU designers didn't make it fast because nobody used it anyway, or people stopped using it because it's slow. But probably the former because I don't remember ever seeing a use mentioned for rotate-through-carry by more than 1 bit.
For extended-precision shifts, x86 has a double-shift instruction (shld
/ shrd dst, src, count
) that shifts dst
, shifting in bits from src
instead of zeros or copies of the sign bit. It doesn't work with 2 memory operands, so an extended-precision shift loop has to load and store registers with separate instructions. That's more code size than a loop using rcr dword [edi], 1
/ sub edi, 4
, but code size is rarely a problem on modern x86 and doing the loads/stores with separate instructions isn't slower.) More importantly, shrd
shifts multiple bits at once, so you can loop over an array once to implement a multi-precision shift by 2 or more bits.
Extended rotate can only shift one bit at a time between registers, because it uses a 1-bit storage space (in flags). I think on m68k if you did want to shift multiple bits between registers, you'd probably copy a register and use regular shifts + OR to combine. (Or a rotate and AND/OR to split up the bits.)
On AMD CPUs, shld
/shrd
is slower than rcl
/rcr
-by-1, but it's the opposite on Intel CPUs. (http://agner.org/optimize/).
I can't really think of any use cases other than shifting bits between registers. Maybe if you shift a bit out, then branch on something that might set or clear the X bit, then shift the bit back in, you could use an extended rotate to do something to the low or high bit? But you could usually do the same more easily with AND, OR, or XOR with a constant.