Is there a way to perform arithmetic right shift using only xorq, andq, addq, iaddq, & subq?

Question

I'm in a hypothetical architecture that only has those operations (Y86). No arithmetic right shift exists. I'm essentially trying to capture the upmost bit to determine if the number is negative and if so add it to the result register, rax.

Edit:

Sorry all, I forgot to specify, I'm trying to avoid conditional branches to see if it improves the efficiency. No cmov exists in the version I'm in.

The furthest I've gotten is:

andq $0x10000000, elem
subq $0x01111111, elem
addq elem, %rax

However, for a result of 0 this doesn't work.

y86 does have conditional jumps though. Also some versions have cmov. — Jester, Nov 19 '18 at 20:39
No `mov`? Or can you use `add` as a load, with a memory source and a zeroed destination (from xor-zeroing)? If so, you could maybe create a lookup table for 1 byte at a time. Maybe a useful building block? (Using unaligned stores into a buffer + storing zeros, you should be able to isolate 1 byte of a qword). Hard to use, and doesn't help for carry between bytes of the result, though. [LC3 Assembly Bitwise Right Shift](https://stackoverflow.com/q/10077798) is about another ISA lacking a right shift. — Peter Cordes, Nov 19 '18 at 20:49
Sorry all, I forgot to specify, I'm trying to avoid conditional branches to see if it improves the efficiency. No cmov exists in the version I'm in. — monsieur40a, Nov 19 '18 at 21:35
How are you defining "efficiency"? There are no physical y86 CPUs. If you mean performance when running in an interpreter, then keep in mind that control dependencies in the y86 code will become data dependencies in the interpreter so branch-mispredicts aren't directly a thing. (Branch misses in the interpreter usually come from variations in the instruction mix, if indirect branches are used to dispatch to a handler for each opcode.) Or are you imagining a hypothetical physical y86 with prefetch + branch prediction but still no right shift? — Peter Cordes, Nov 19 '18 at 21:41
Average cycles per element for a variety of array sizes containing a random number of negative elements. This is for an architecture lab for an Intro to Computer Architecture course. — monsieur40a, Nov 19 '18 at 21:55
But how do you model execution to get a cycle count? Are you simulating a y86 that's like a modern x86? e.g. 4-wide superscalar execution likeIntel's Sandybridge family of microarchitectures? Out-of-order execution? High branch-mispredict penalty? Or a simple short in-order pipeline with low branch cost like an AVR microcontroller (1 cycle for not taken, 2 cycles for taken IIRC). [Static analysis for modern x86](https://stackoverflow.com/q/51607391) is hard even without branches. I think you're going to need a branch if you don't have cmov or setcc, so you may have no choice. — Peter Cordes, Nov 19 '18 at 22:24
re: your update: Like addition, the low bits of a subtraction don't depend on any higher bits, so it's not a helpful building block for a right shift. Carry/borrow propagate from LSB to MSB only. Your example is only going to help you for inputs with a single bit set. And you need to find where that bit is to choose the right constant to subtract. — Peter Cordes, Nov 19 '18 at 22:29
[MOV in x86 is Turing complete](https://www.cl.cam.ac.uk/~sd601/papers/mov.pdf), and so are XOR, SUB, ADD, XADD, ADC, SBB, AND/OR, PUSH/POP, 1-bit shifts, or CMPXCHG/XCHG. Thus it's definitely possible to do anything, not just a shift with either of those instructions. [There's even a compiler to compile C code to a program with only MOVs or one of those instructions](https://github.com/xoreaxeaxeax/movfuscator) — phuclv, Nov 20 '18 at 01:37
and what's iaddq? A two's complement machine never has separate instructions for signed and unsigned addition — phuclv, Nov 20 '18 at 12:31
@phuclv the `i` is for immediate. y86 encodes that in the mnemonic similarly to mips which also has `add`, `addi` (and incidentally also unsigned versions `addu` and `addiu` although those only differ in overflow signaling) — Jester, Nov 20 '18 at 12:36

score 1 · Answer 1 · answered Nov 19 '18 at 20:54

1

Assuming you can use a loop and conditional branch:

    mov result, 0
    mov lead, 2
    mov follow, 1
1:
    mov tmp, n
    and tmp, lead
    jz 2f
    add result, follow
2:
    add follow, follow
    add lead, lead
    jnz 1b

The tmp, lead, and follow variables must be in registers. result can be either in a register or in memory.

answered Nov 19 '18 at 20:54

prl

11,716
2
13
31

How does this shift in a copy of the sign bit? After `add lead,lead` leaves `lead=0`, won't you just shift in a zero for the top bit? Hmm, I guess since the shift count is fixed at 1, we can just `and n, 1<<63` and ADD that, to copy the original sign bit into the logical right shift result. We don't need to duplicate it to any lower positions beyond the normal shifting of it to bit 62. – Peter Cordes Nov 19 '18 at 20:58
Oops, I totally overlooked that you wanted an *arithmetic* right shift (probably because I don't think I've ever used one :-) ). @Peter – prl Nov 19 '18 at 22:09

Ruud Helderman · Accepted Answer · 2018-11-21T08:16:36.110

If Y86 allows MOVQ to access memory that is not QWORD-aligned, then it can be done. But I doubt if it will perform better than a conditional branch.

The trick is to write the number to memory, then read it again from an address that is slightly 'off'. This effectively shifts bits across a multiple of 8. Combine this with addq to shift the bits 1 position to the left.

Please note that this highly relies on the endianness of the processor architecture. The following example is based on little endian (Intel-style). On big endian, the offsets will have to be adjusted.

(If you prefer AT&T syntax, then please reverse the operands and remove the brackets.)

movq rbx,number         ; sign bit is bit 63 of rbx
movq [address],rbx      ; sign bit is most significant bit of the byte at [address+7]
movq rbx,[address+4]    ; sign bit is bit 31 of rbx
addq rbx,rbx            ; sign bit is bit 32 of rbx
movq [address],bx       ; sign bit is least significant bit of the byte at [address+4]
movq rbx,[address+4]    ; sign bit is bit 0 of rbx
andq rbx,1              ; rbx = 0 for positive number, rbx = 1 for negative number
addq ax,bx

you mean "you prefer AT&T syntax"? Because your code looks like Intel syntax. However Intel syntax doesn't have suffixes like that, and `addq ax, bx` is incorrect since the suffix should be w — phuclv, Nov 21 '18 at 04:21
@phuclv Thank you for pointing this out, I adjusted the note. My experience with assembly is outdated. Please consider my code sample as pseudo code. I trust readers will understand the intention. — Ruud Helderman, Nov 21 '18 at 08:20

Is there a way to perform arithmetic right shift using only xorq, andq, addq, iaddq, & subq?

2 Answers2