What is a circular shift with extend used for?

Question

I remember in an assembly class, we learned the m68k processor, and there were 3 kinds of shifts you could do. Linear shift, circular shift, and circular shift with extend.

The last one, circular shift with extend, basically rotates all bits left or right but it puts the outermost one into an extend bit before it would be moved to the beginning again (if you ever shifted by 1 again).

I drew a little pic:

Basically, a 33rd bit is used in the circular shifting, but doesn't show up in the 32-bit word of course. The 33rd bit is the X flag of the processor, which stands for extend. You could just as easily use any status flag, such as the carry flag, but I guess the motorola guys wanted to preserve that flag so it doesn't get overwritten, in case you need the carry flag for its normal duties in the middle of some algorithm that also needs rotate with extend.

Anyway, what is the purpose of rotate with extend? What is it used for? What is it needed for? It seems so strange. Why in the world do you need a 33rd bit?

I've read this and this, two related questions, but they didn't talk about the circular shift with extend.

I'm aware of some uses for normal shifts. Basically to divide by two, or test divisibility, and permutate the bits for randomness. Stuff like that. But I cannot think of why you would need some extended bit inserted into the rotate but doesn't come out in the result.

EDIT: I'm interested in any use of it, modern or old, and doesn't matter if it's on the m68k or not. The m68k is just the first placed I encountered it (and I never even used it there).

related: [What's the purpose of the rotate instructions (ROL, RCL on x86)?](https://stackoverflow.com/q/4976636) / [Practical uses for rotate carry left/right](https://stackoverflow.com/q/26913354) but those don't focus on the "with extend" / "through carry" part; many of the answers on those other questions are just about normal rotates. — Peter Cordes, Feb 14 '22 at 18:09

score 2 · Answer 1 · answered Dec 03 '17 at 19:11

2

Suppose you want to shift a 32 bit word to the right but all you have are 16 bit registers. To do this you have to shift the two 16 bit parts of the 32 bit word to the right and transfer the bit that was shifted out of the high word into the low word.

If you just have logical shifts, this is cumbersome to do as you have to manually fix up that bit. The rotate-through-carry instruction allows you to keep the bit that needs to be transferred in the carry flag and shift it in in one go. The rotate-through-carry instruction places the bit shifted out in the carry flag so you can easily chain this together to right-shift arbitrarily sized data.

answered Dec 03 '17 at 19:11

fuz

88,405
25
200
352

1

What fuz said, and practically you could also move bitmap images (B&W 1b bitmap type) left/right by single pixel with these rotations. On ZX Spectrum to move whole screen 1 pixel right you could have used unrolled loop of `rr (hl)` `inc hl` instructions (or rather `inc l` where you know the low 8 bits of address `l < 255`) ... still wasn't fast enough (even unrolled and with partial `inc`) to move whole screen in single frame, at most only about 55% of screen was possible to move by this way (smarter games did use pre-rotated sprites in memory to avoid slower rotate instructions). – Ped7g Dec 03 '17 at 19:49
That's a proper answer, albeit not focussing on the m68k which has only 32-bit registers. If you double the number of bits everywhere, it would fit. – tofro Dec 11 '17 at 19:26
@tofro Clearly, you have the same problem when trying to shift a 64 bit number with only 32 bit registers. – fuz Dec 11 '17 at 19:37

Peter Cordes · Answer 2 · 2017-12-04T07:44:23.233

On x86 (and most architectures that have this instruction), the extra bit is the carry flag, and lots of things can set that flag. Rotate-through-carry left or right lets you shift the carry bit back into some other register. Interesting that m68k uses a different flag for extended-rotate.

I'm not very familiar with m68k anymore, so I'll mostly talk about other arches. (But apparently that's what you want :)

Instructions like this are often available on microcontrollers that are much less capable than x86 or m68k. Or with limited opcode space (and decoding complexity), some CPUs only have a rotate-through-carry by 1 and not regular shift instructions. If you want to shift in a zero, make sure the flag is clear first.

8051 is like this: only rotate left/right by 1, and rotate-with-carry left/right by 1, not shift. See rlc in the ISA ref manual. When possibly, avoid the clr instruction when you want to shift in a zero by putting rlc after something else that leaves Carry cleared.

I think it's typical for extended circular shift to use the carry flag, rather than its own X bit the way m68k does.

Anyway, extended-precision rotate is kind of traditional / expected for CPUs, but has more uses on more limited CPUs.

For a register, rcl reg, 1 is the same operation as adc reg,reg: shift the old contents left by 1, and set the low bit to CF. The bit shifted out by the rotate or adc becomes the new value of CF. So RCL is only a non-redundant part of an instruction set if it's available with a memory operand, or (for weird cases) with a count greater than 1. (The rotate-right version is not redundant, though.)

IDK why you'd ever use a count > 1. On modern x86, rotate-through-carry is fairly fast with count=1, but definitely slow with a variable count or fixed count>1. IDK which way the chicken/egg problem went: CPU designers didn't make it fast because nobody used it anyway, or people stopped using it because it's slow. But probably the former because I don't remember ever seeing a use mentioned for rotate-through-carry by more than 1 bit.

For extended-precision shifts, x86 has a double-shift instruction (shld / shrd dst, src, count) that shifts dst, shifting in bits from src instead of zeros or copies of the sign bit. It doesn't work with 2 memory operands, so an extended-precision shift loop has to load and store registers with separate instructions. That's more code size than a loop using rcr dword [edi], 1 / sub edi, 4, but code size is rarely a problem on modern x86 and doing the loads/stores with separate instructions isn't slower.) More importantly, shrd shifts multiple bits at once, so you can loop over an array once to implement a multi-precision shift by 2 or more bits.

Extended rotate can only shift one bit at a time between registers, because it uses a 1-bit storage space (in flags). I think on m68k if you did want to shift multiple bits between registers, you'd probably copy a register and use regular shifts + OR to combine. (Or a rotate and AND/OR to split up the bits.)

On AMD CPUs, shld/shrd is slower than rcl/rcr-by-1, but it's the opposite on Intel CPUs. (http://agner.org/optimize/).

I can't really think of any use cases other than shifting bits between registers. Maybe if you shift a bit out, then branch on something that might set or clear the X bit, then shift the bit back in, you could use an extended rotate to do something to the low or high bit? But you could usually do the same more easily with AND, OR, or XOR with a constant.

Another use case is filling a bit-buffer. You initialize the buffer with `1` and fill it by using `rcl` to shift the value in `cf` into the buffer. When `cf` is set after `rcl`, you know that the buffer is full and can flush it out without having to keep track of a counter or similar. — fuz, Dec 03 '17 at 19:59
`(I only noticed you were specifically asking about m68k after writing most of this answer).` Actually I'm interested in any use of rotate with extend. The m68k is just where I encountered it first. I made an edit to the OP. — DrZ214, Dec 03 '17 at 20:04
`IDK why you'd ever use a count > 1` Me neither. I can certainly think of uses for linear shift and circular shift (without extend), but you know I can't even remember if the m68k instruction set allowed an immediate value passed to the shift operation, or if you had to loop thru the instruction n times. — DrZ214, Dec 03 '17 at 20:09
@DrZ214: Ok, good glad I still posted this answer. On x86, where it uses CF, there's probably more weird tricks you can do by shifting a bit from CF (set some other way) into a register. As Fuz points out, you can set a bitmap this way. — Peter Cordes, Dec 03 '17 at 20:11
@fuz: On x86, `adc reg,reg` could fill a bit buffer the same way as `rcl reg,1`. The only difference is flags other than CF (and performance: `adc` is faster on SnB-family CPUs because `rcl` is 3 uops. About the same on P6/Core2/Nehalem though, both 2 uops). — Peter Cordes, Dec 03 '17 at 20:14
@DrZ214: added a section about 8051, which only has rotates, not regular shifts. — Peter Cordes, Dec 03 '17 at 20:31
On x86 the chance to use `rcr/rcl` in loops is created by the `inc/dec` instructions not affecting CF (and `loop` does not either, but `loop` is slow. Or eventually you may calculate value in `rcx` with `lea` and use `jrcxz` to loop without affecting CF, but `inc/dec` are natural choice). The horizontal 1 pixel scroll usage was valid only in 4/16 colour EGA/VGA modes (they use bit planes), and it would be still impossibly slow. That's the reason why it took several years until few people managed to create scrolling platform PC games, like Carmack with his Com. Keen (w/o rotations). — Ped7g, Dec 03 '17 at 22:34

tofro · Answer 3 · 2017-12-04T12:10:42.787

ROXL is pretty useful for non-destructively testing bits and can save quite some instructions compared to BTST.x instruction (which is only avalable in 8 and 32 bit sizes). Once you do that, rotating by more than one place also can make things like skipping bits possible.

  ROXL.W #4,D0      ; shift bit 12 into carry and X
  BCC.S isNotSet    ; branch if bit is not set
  BSR isSet         ; do whatever you want
isNotSet:
  ROXL.W #11,D0     ; rotate around to reset to previous value

score 1 · Answer 4 · answered Jan 17 '23 at 02:15

It's also used for extended precision multiply and divide, because these operations are basically adding (or subtracting) and shifting. That also means we can do cool things like this:

ADD eax, ebx
RCR eax, 1

to calculate the average of 2 integers.

There is an assembly demo written in 256 bytes somewhere that uses RCL register, CL as a tool to calculate a random number in just 2 bytes.

What is a circular shift with extend used for?

4 Answers4