5

I'm interested in practical applications, even if they are outdated by modern standards.

There's similar question, about ROL and ROR here, but it doesn't really answer about RCL/RCR.

I can come up with some applications for RCL, RCR with operand 1 (i.e. for some LFSRs), but i can't think of any sensible application with non 1 operand.

So can anyone enlighten me?

P.S. sample code is more than welcomed.

update 1: as Peter Cordes mentioned in comments below, one (quite obvious) application is shrd/shld. (IIRC rcl/rcr instructions were already in 8080)

Maybe 'non 1' above was not clear, but mind that I'm mostly interested in usage, where operand is != 1 (RC(L|R) REG, c with c being either > 1 or == cl).

GiM
  • 460
  • 4
  • 13
  • 4
    shifting bits between registers one-at-a-time was one use-case, before `shrd` / `shld` existed. Or doing a 32-bit rotate across two 16-bit registers. – Peter Cordes Apr 05 '19 at 21:38
  • 1
    You put x86-64 as a tag which triggered this comment. The rcl/rcr come from the 8-bit era and are still there in x86 due to backward compatibility requirements. I find the question interesting and will try to come up with something tomorrow. I worked quite a lot with z80 (which has similar instructions). – tum_ Apr 05 '19 at 22:56
  • @PeterCordes - for others here, note that `shrd` / `shld` only updates the destination register (with bits from source register shifted in), leaving the source register unshifted. Although this means a second instruction in the case of a two register (or 2 memory) shift, it is useful for an extended precision shift in the case of a more than 2 register (or 2 memory) shift. – rcgldr Apr 06 '19 at 00:19
  • In addition to Peter Cordes' answer: There may still be applications where a long value (e.g. a 320 bit number) must be shifted. In this case you shift the first 32 (or 64) bits using `shl` (or `shr` or `sar`) and the remaining N*32 (or N*64) bits using N `rcl` (or `rcr`) operations. – Martin Rosenau Apr 06 '19 at 06:46
  • 1
    @MartinRosenau: If you're shifting by only 1 bit, then yes that's worth considering. Otherwise SSE2 or better AVX2 shuffles + shifts are clearly a better bet. e.g. GMP's SSE2 version of [`mpn_lshift`](https://gmplib.org/manual/Low_002dlevel-Functions.html) uses `psllq` / `psrlq` / `por`, and `punpcklqdq`. https://gmplib.org/repo/gmp/file/tip/mpn/x86_64/fastsse/lshift.asm, with various cases depending on alignment. (Of course if we're not talking about x86-64, then SSE2 might not be available. And variable-count `shld` is not super fast on Intel; GMP doesn't seem to use it.) – Peter Cordes Apr 06 '19 at 07:29
  • 1
    Yes, rcl/rcr by implicit 1 and by CL [existed in 8086](http://www.posix.nl/linuxassembly/nasmdochtml/nasmdoca.html), presumably for consistency with the encoding of other shifts. Then 286 added the imm8 count versions of all shifts, including these. Then 386 added `shld`/`shrd` by imm8 or by CL. – Peter Cordes Apr 06 '19 at 09:24
  • 2
    I'm not aware of any simple use-case for RCL/RCL with count other than 1. Sorry I missed that you were asking about count != 1 uses, that's a really good question :P. It might well be *just* for consistency of machine encoding and so they're not a special case in the decodes. – Peter Cordes Apr 06 '19 at 09:26
  • yeah, consistency is my guess as well, that's why i'm interested if there's some known sensible usage – GiM Apr 06 '19 at 09:33
  • yeah, a *known* sensible usage of "rotating more than a single bit through carry" is not an easy task to find, indeed. The closest approximation I can think of is a hypothetical case where you have something that requires exactly 9 bits (0-511 range) to be represented and you need to rotate it for whatever purpose. Then the opcode seems to be the perfect fit :) – tum_ Apr 06 '19 at 11:43

1 Answers1

5

In shifting operations, these instructions have the same role as the the add-with-carry (adc) or subtract-with-carry (sbb) instructions in additions:

It is used as second instruction when processing numbers that are longer than the maximum size of a CPU register so the number must be processed using multiple operations.

Example: On a 386 CPU you can perform 32-bit operations using a single instruction. However, you might want to process 320-bit integer numbers.

Let's say we have a 4-bit CPU and we want to perform a "arithmetic right shift" (sar) operation on a 16-bit integer number:

Integer: ABCDEFGHIJKLMNOP  (A-P = some bits that may be 1 or 0)

Operation on a 16 bit CPU:

    ABCDEFGHIJKLMNOP (SAR 1) -> AABCDEFGHIJKLMNO, CF = P

Operation on a 4 bit CPU:

    ABCD (SAR 1) -> AABC, CF = D
    EFGH, CF = D (RCR 1) -> DEFG, CF = H
    IJKL, CF = H (RCR 1) -> HIJK, CF = L
    MNOP, CF = L (RCR 1) -> LMNO, CF = P

So the final result on the 4-bit CPU is AABCDEFGHIJKLMNO, CF = P

Of course the same example would work with a 256-bit number on a 64-bit CPU...

Please also note:

Using add/adc, sub/sbc or shl/rcl we start at the low bits and continue with the high bits. However, using shr/rcr or sar/rcr it is the other way round.

Martin Rosenau
  • 17,897
  • 3
  • 19
  • 38
  • 3
    Also worth mentioning that `adc x,x` is exactly equivalent to `rcl x, 1` as far as reading/setting CF, but faster. (`rcl rax,1` is a 3 uop instruction on Skylake, but `adc rax,rax` is single-uop https://agner.org/optimize/. Rotate-by-1 sets extra flags, but not *all* flags, so it decodes to a flag-merging uop. Variable-count `rcl` is even slower, but wouldn't have many use-cases even if it was fast, AFAIK.) So `rcl` is only interesting if your data is in memory. `rcr` can't be emulated that easily, though. – Peter Cordes Apr 06 '19 at 07:47
  • 3
    Fun fact: on AVR (an 8-bit RISC), [`rol` is a pseudo-instruction for `adc same,same`](https://www.microchip.com/webdoc/avrassembler/avrassembler.wb_ROL.html). (AVR rotates are always through carry.) – Peter Cordes Apr 06 '19 at 07:49