31

I always wondered what's the purpose of the rotate instructions some CPUs have (ROL, RCL on x86, for example). What kind of software makes use of these instructions?

I first thought they may be used for encryption/computing hash codes, but these libraries are written usually in C, which doesn't have operators that map to these instructions. (Editor's note: see Best practices for circular shift (rotate) operations in C++ for how to write C or C++ that will compile to a rotate instruction. Also, optimized crypto libraries often do have asm for specific platforms.)

Has anybody found an use for them? Why where they added to the instructions set?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Gratian Lup
  • 1,485
  • 3
  • 19
  • 29
  • 15
    Actually a good C compiler will emit `rol` opcodes when compiling code which tries to compute a rotation with the C operators (i.e. `(x << 12) | (x >> 20)`). – Thomas Pornin Feb 14 '11 at 22:39
  • @Brian: I wrote `rol`, I meant `rol` (well, could be `ror`). The rotation opcode. – Thomas Pornin Feb 15 '11 at 23:07
  • 1
    @Thomas My C is rusty, I was thinking the << and >> operators were shift and not rotate. – Brian Knoblauch Feb 16 '11 at 12:32
  • 12
    @Brian: `<<` and `>>` _are_ shifts. But for a 32-bit value `x`, the whole expression `(x << 12) | (x >> 20)`, consisting of two shifts (one left, one right) and a bitwise OR, has the same effect than a rotation of a 32-bit word (here, by 12 bits to the left). C compilers are smart enough to notice it, and compile the complete expression as a single `rol`. – Thomas Pornin Feb 17 '11 at 20:21
  • 2
    Some libraries have bit rotate intrinsics, but I also think C should have rotate operators at first. It will make understanding the code much easier and the compiler would have less work to do. – phuclv Aug 17 '13 at 15:29
  • I've actually just coded an assembly routine where I needed to manipulate the Alpha channel in a TColor (32-bit RGBA with A in the high nibble). The easy way to do this was ROL EAX,8 / MOV AL,Value / ROR EAX,8 instead of AND EAX,$00FFFFFF / MOV DL,Value / SHL EDX,24 / OR EAX,EDX (assuming that Value isn't a constant but a value that may be different on each iteration). I ended up with BSWAP EAX / MOV AL,Value / BSWAP EAX instead, but that's neither here nor there :-). – HeartWare Jul 24 '19 at 08:38
  • High BYTE - not nibble :-) – HeartWare Jul 24 '19 at 09:09

6 Answers6

29

Rotates are required for bit shifts across multiple words. When you SHL the lower word, the high-order bit spills out into the carry. To complete the operation, you need to shift the higher word(s) while bringing in the carry to the low-order bit. RCL is the instruction that accomplishes this.

                      High word             Low word         CF
Initial          0110 1001 1011 1001   1100 0010 0000 1101    ?
SHL low word     0110 1001 1011 1001   1000 0100 0001 1010    1
RCL high word    1101 0011 0111 0011   1000 0100 0001 1010    0

ROL and ROR are useful for examining a value bit-by-bit in a way that is (ultimately) non-destructive. They can also be used to shunt a bitmask around without bringing in garbage bits.

ecm
  • 2,583
  • 4
  • 21
  • 29
Nietzche-jou
  • 14,415
  • 4
  • 34
  • 45
  • When would use rotation to test bits instead of `BT`? – Gabe Feb 12 '11 at 07:12
  • 4
    When you want to test them all and, perhaps, in order. – Nietzche-jou Feb 12 '11 at 07:15
  • 2
    Or alternatively, when you don't have BT to begin with. – Nietzche-jou Feb 12 '11 at 07:22
  • 2
    Rotates are only effective when shifting only 1 bit – phuclv Aug 17 '13 at 15:26
  • 2
    Wouldn't CF be 0 after the third step? (the bit that goes off is set to CF and *previous value* of CF is inserted to the right-most position) – Assad Ebrahim Mar 31 '14 at 02:20
  • 1
    "In assembly languages these instructions are represented by mnemonics such as ADD/SUB, ADC/SBC (ADD/SUB including carry), SHL/SHR (bit shifts), ROL/ROR (bit rotates), RCR/RCL (rotate through carry), and so on. [1] The use of the carry flag in this manner enables multi-word add, subtract, shift, and rotate operations." https://en.wikipedia.org/wiki/Carry_flag – phuclv Nov 13 '14 at 16:23
17

The rotate shift opcodes ROL, RCL, ROR, RCR) are used almost exclusively for hashing and CRC computations. They are pretty arcane and very rarely used.

The shift opcodes (SHL, SHR) are used for fast multiplication by powers of 2, or to move a low byte into a high byte of a large register.

The difference between ROL and SHL is ROL takes the high bit and rolls it around into the low bit position. SHL throws the high bit away and fills the low bit position with zero.

dthorpe
  • 35,318
  • 5
  • 75
  • 119
  • 5
    I don't see how you answered the question. – Gabe Feb 12 '11 at 06:55
  • 3
    maybe you can add the difference to ROL/RCL and ROR/RCR in your answer too. – Jeroen Wiert Pluimers Feb 12 '11 at 13:21
  • 3
    Pretty arcane and rarely used? Really? There are many places where rotation is useful, especially in, as you say hashing and cryptography. On many CPU's where the amount shifted affects time, it's actually faster to rotate and bitwise and rather than doing a shift. – The Welder Dec 06 '15 at 10:51
  • 5
    Yes, very rarely used. Hashing and crypto are things to be used from libraries, not something every developer should write for themselves. – dthorpe Dec 18 '15 at 20:37
  • 2
    Note that from a CPU design perspective (which instructions to provide), the relevant measure is how frequently it is (or would be) *executed*, not how many different pieces of software will contain the instruction. It's not that hard to emulate (unlike some special-purpose instructions like `popcnt` or `crc32` or SIMD `psadbw` which was added basically for video-encode motion-search), but OTOH it doesn't take much extra hardware to make a barrel shifter capable of rotating. – Peter Cordes Aug 01 '21 at 03:15
  • ROR can be used for swaps during certain sorts, if the values are adjacent in memory. Works great. – JeffW Dec 25 '22 at 20:07
9

ROR ROL are "historic" but still useful in a number of ways.

Before the 80386 (and opcode BT), ROL would be used a lot to test a bit (SHL doesn't propagate to the carry flag) - actually in 8088, ROR/ROL would only shift by 1 bit at a time !!!!

Also if you want to shift one way and then the other way without loosing the bits that have been shifted out of scope, you'd use ROR/ROL instead of SHR/SHL

Alain Pannetier
  • 9,315
  • 3
  • 41
  • 46
  • 2
    And the 8080 didn't even have shift instructions -- rotate was all you got! – Gabe Feb 12 '11 at 07:27
  • 2
    It is right that on the 8088 you can only rotate by one, **if you use an immediate rotate count**. However, the 8088 does support rotating by a count given in the register `cl` too. (Immediate byte shift/rotate counts other than 1 were added in the 186 instruction set.) – ecm Jul 30 '21 at 19:23
  • 3
    [ROL](https://www.felixcloutier.com/x86/rcl:rcr:rol:ror) sets CF the same way [SHL](https://www.felixcloutier.com/x86/sal:sar:shl:shr) does. (according to the last bit shifted left out of the high bit). The only difference is that ROL shifts the bit in the bottom instead of shifting in zeros. (And in SHL setting SF, ZF, and PF like a normal ALU instruction, unlike rotates.) – Peter Cordes Aug 01 '21 at 03:19
4

If I understand you correctly, your question is this:

"Given the fact that rotation instructions seem to be very special-purpose and not emitted by compilers, when are they actually used and why are they included in CPUs?".

The answer is twofold:

  1. CPU's are not designed specifically to execute C programs. Rather, they are designed as general purpose machines, intended to solve a wide array of problems using a wide variety of different tools and languages.

  2. The designers of a language are under no obligation to use every opcode in the CPU. In fact, most of the time, they do not, because some CPU instructions are highly specialized, and the language designer has no pressing need to use them.

More information about bitwise operators (and how they relate to C programming) can be found here: http://en.wikipedia.org/wiki/Bitwise_operation

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
3

Back when microprocessors were first created, most programs were written in assembly, not compiled. The majority of CPU instructions are probably not emitted by compilers (which is the impetus for creating RISC), but are often relatively easy to implement in hardware.

Many algorithms in graphics and cryptography use rotation, and their inclusion in CPUs makes it possible to write very fast algorithms in assembly.

Gabe
  • 84,912
  • 12
  • 139
  • 238
1

I think many answers here got it somewhat backwards, including the currently accepted one. The biggest application is in shifting data across byte/word boundaries, which is extensively used in

  • extracting and inserting bit patterns
    • protocols (insert 5 bits starting from bit 6)
    • compression schemes (LZW77 and more)
    • data transfer (300 baud modems anyone? 7-bit data + parity)
  • arbitrary precision arithmetic
    • multiplying/dividing by 2 utilises rotations-through-carry
    • multiplying/dividing by other powers of two need the ROL (or ROR)
    • scrolling 1-bit graphics horizontally

And the niche applications:

  • crc16/32
  • ciphers
  • non-destructive moving bits to sign bit or to carry for testing

The historical perspective is that shifting was expensive: when one needs to shift say 16 bits left by 3, in chunks of 8 bits (or 128 bits left in chunks of 64 bits), a ROL performs two expensive shifts at the cost of one:

rotate all bits left by 3
      hi       lo
src = fedcba98|76543210
dst = cba98765|43210---

Notice, that the bits "765" need to be shifted right by 5, while bits "43210" need to be shifted left by 3. This is all accomplished by a single rotation, which put all the right bits to the correct position, even if they are accompanied by the wrong bits, which are recombined by masking, which is an inexpensive operation:

dst_lo = ((src_lo ROL 3) & 0b11111000)
dst_hi = ((src_lo ROL 3) & 0b00000111) | (src_hi << 3)

This extends to bignum shifting, or scrolling a monochrome graphics plane horizontally by arbitrary number of pixels.

This algorithm is so essential, that 80386 included a double-rotate instruction for it.

Aki Suihkonen
  • 19,144
  • 1
  • 36
  • 57
  • Many CPUs have a rotate-through-carry flag which you could use for equivalent purposes, if you're limited to shifting 1 bit at a time. That also enabled variable-count shifts across register boundaries using a loop, which wouldn't be possible with ROL without another shift (and NOT) to create those masks. Still, yes, interesting point for constant shift-counts on machines which have multi-bit shifts that are faster than looping but still slow. (Like 8086). However, you'd optimize to `src_hi << 3` instead of ROL + mask, since the bits shifted out there aren't shifted into anything. – Peter Cordes Jan 11 '22 at 17:38
  • Yes, it's definitely worth it to have the destructive variants (arithmetic and logical shifting right / logical shift left) for those cases that need it; And indeed the _last_ word benefits from those instructions. I suppose I wanted to extend the concept to really-multi-word shifting before editing. – Aki Suihkonen Jan 11 '22 at 17:41