how to check if 2 numbers have reversed bits in assembly?

Question

My code should get 2 arr with the same length(k) and to check how much pairs of numbers have the same index (1 in arr 1 and the other in arr2) and are opposite, which means the first bit in 1 should be the last in the other and the second to first would be the second to first and go on...

Overall, this is my code:

IDEAL
MODEL small
STACK 100h
DATASEG

k dw 2                       ;length of arr1 and arr2
ARR1 dw 2 dup (?)
ARR2 dw 2 dup (?)
CODESEG
start:  
    mov ax,@data
    mov ds,ax
    
    lea si,[ARR1]
    lea di,[ARR2]
    mov cx,k                 ; cx=length of arr
    xor dx,dx
    mov [ARR1],1000000000000001b
    mov [ARR2],1000000000000001b
    mov [ARR1+1],0033h
    mov [ARR2+1],0033h
        xor dx,dx
        L1:                                      ; loops through every index in both arr1, and arr2
            mov bx,k
            sub bx,cx
            push cx
            mov cx,16
            L2:
                xor ax,ax
                shr [si+bx],1
                jnc nc1
                inc al
                nc1:
                clc
                shl [di+bx],1
                jnc nc2
                inc ah
                nc2:
                clc
                cmp al,ah
                jne end_loop2
                
                dec cx
                jnz L2
                
            inc dx
            end_loop2:
            pop cx
            
            dec cx
            jnz L1



exit:
    mov ax, 4c00h
    int 21h
END start

My debugger doesn't give me any errors but when I run the code it doesn't work, when I shift left the number in arr 2 it doesn't change the CF though it should.

Do you know why is that happening?

If I understand correctly, you're storing `0x81` into `ARR2` and `0x33` into `ARR2+1`, correct? Then shifting the pair at word length with `shr [di+bx],1`? Keep in mind that x86 is little-endian, which means that the CPU sees your `ARR2` as `0x3381`, so you won't see a carry flag set right away. It will happen about after 5 shifts. Had you stored the bytes the other way you'd see a carry set as soon as you shift once. — puppydrum64, Dec 09 '22 at 11:36
Either that or the assembler is confused about your intent, and is clobbering your `1000000000000001b` with the 33 in `0033h`. You might have better luck if you did `mov [ARR2],8133h.` — puppydrum64, Dec 09 '22 at 11:39
`mov [ARR1+1],0033h` implies 16-bit operand-size because ARR1 used `dw`. But you only used a byte offset of `+1`, so it overlaps with the previous word store. Use a debugger to examine what's in memory after those initial stores. — Peter Cordes, Dec 09 '22 at 14:07
@PeterCordes Ah, I missed that part. I thought they were declared as `db` for some reason. Yes, you need to use [ARR1+2] since each "slot" of the array takes up two bytes. Also you can replace the `jnc nc1 inc al` with a single `adc al, 0` and achieve the same result. I find code with fewer jumps easier to debug, so this might help you as well. — puppydrum64, Dec 09 '22 at 16:32
@puppydrum64: Yes, lots of ways to simplify or optimize this, and agreed that branchless code is easier to follow unless it requires more complexity. Here, I might `mov al, [si+bx]` / `and al, 1` before shifting to grab that bit. Or use `salc` or `sbb al,al` to materialize a 0 or -1 according to CF. (That's a bit tricky in itself, but a useful idiom to use once you learn it. And `sbb ah,ah` can get the high bit.) Or just shr / `adc al, 0` / shl / `sbb al, 0` / `jz equal_bits` instead of materializing separate booleans to compare. — Peter Cordes, Dec 09 '22 at 17:02
@puppydrum64: And of course I'd want both words in registers instead of using a slow memory-destination shift in a loop. And maybe something clever like rotate each way, then `mov cx, ax` / `xor cx, dx` / `js mismatch` to check the high bit each tie, if I was going one bit at a time instead of using 4-bit lookup tables to bit-reverse in chunks. I think that needs a tmp reg, we only have `cmp` and `test`, not like ARM `teq` that sets flags based on an XOR. (Of course on ARM, `rbit r0, r0` to bit-reverse then just `cmp r0, r1` :P Or `cmp r1, r0, lsr #16` shift if you have 16-bit ints). — Peter Cordes, Dec 09 '22 at 17:06
@puppydrum64: Just for fun, I optimized two versions of the bit-loop. https://godbolt.org/z/354eaz598 (tested on my local Linux machine, using 32-bit pointers but otherwise still 16-bit.) Looks like the `xor` and check SF idea is actually pretty good, 6 simple (reg,reg) instructions in the inner loop, which exits when `num2<<=1` becomes zero. Using only registers; values loaded from the array just once each, and not modified. Depending on bit distribution, small numbers become zero sooner, so we might prefer exiting when right-shifting makes one number zero. — Peter Cordes, Dec 09 '22 at 20:18

score 3 · Accepted Answer · answered Dec 09 '22 at 23:39

mov [ARR1],1000000000000001b
mov [ARR2],1000000000000001b
mov [ARR1+1],0033h
mov [ARR2+1],0033h        ; ARR1/ARR2 contain 1, 51, 0, ?

Your program defines 2 arrays that have each 2 word-sized elements. Because a word occupies 2 bytes in memory, assigning a value to the 2nd element must use an offset of +2. Assigning a value to the 3rd element would have to use an offset of +4, and so on.

mov [ARR1], 8001h
mov [ARR2], 8001h
mov [ARR1+2], 0033h
mov [ARR2+2], 0033h         ; ARR1/ARR2 contain 1, 128, 51, 0

L1:
  mov bx,k
  sub bx,cx

The inner loop (L2) is processing words but the outer loop (L1) advances through the array per byte. On the 1st outer iteration CX is 2 so BX = k - CX becomes 0, and on the 2nd outer iteration CX=1 so BX = k - CX becomes 1 which then will begin processing a word composed of the high byte from the 1st array element together with the low byte from the 2nd array element.
The good news is that you don't need that convoluted way (using BX) to walk through these arrays. Just add 2 to SI and DI on every iteration of the outer loop.

Your program contains a number of redundant instructions like xor dx, dx and unnecessary instructions like clc. For clarity you should remove these unproductive lines.

to check how much pairs of numbers have the same index (1 in arr 1 and the other in arr2) and are opposite

Knowing that the arrays hold each 2 elements, this means that the final result of your program will have to be a number in the range [0,2].

Without the forementioned errors your program would have worked fine, except that a solution that wipes out the arrays is not something I would have chosen.
Below is my implementation. Read the tail comments carefully!

    lea  si, [ARR1]      ; SI is address of 1st array
    mov  [si], 8001h     ; Assign 1st element
    mov  [si+2], 0033h   ; Assign 2nd element
    lea  di, [ARR2]      ; DI is address of 2nd array
    mov  [di], 8001h     ; Assign 1st element
    mov  [di+2], 0033h   ; Assign 2nd element

    mov  bp, k           ; Number of array elements
    xor  dx, dx          ; Final result (will be 1 based on the fixed data)
L1:
    mov  cx, [di]        ; CX is current element from 2nd array
    mov  bx, [si]        ; BX is current element from 1st array
    xor  ax, ax          ; AL is status byte, AH is a convenient 0
L2:
    shr  bx, 1           ; The bit that comes out of BX
    adc  al, ah          ;   is added to AL (that was zeroed beforehand)
    shl  cx, 1           ; The bit that comes out of CX (at the opposite side)
    sbb  al, ah          ;   is subtracted from AL
    jnz  NOK             ; If both bits matched then AL would still be zero
    test bx, bx          ; Has BX more ON bits ?
    jnz  L2              ; Yes

                         ; Early exiting if BX has no more ON bits
                         ; If, at the same time, CX too has no more ON bits
                         ; then an OK pair was found

    test cx, cx
    jnz  NOK
OK:
    inc  dx              ; Found an OK pair
NOK:
    add  si, 2           ; Next array element is 2 bytes higher in memory
    add  di, 2
    dec  bp              ; Repeat for every element in the array(s)
    jnz  L1

`adc al, 0` is the same instruction size (2 bytes) and more obvious. The only reason not to use it is that modern Intel CPUs have a performance bug where that encoding is 2 uops, even on Broadwell and later when `adc` in general is 1 uop, not just in the special case: [Which Intel microarchitecture introduced the ADC reg,0 single-uop special case?](https://stackoverflow.com/q/51664369). I used `adc` / `sbb` in one of the versions of this I was playing around with: https://godbolt.org/z/354eaz598 in NASM syntax, where I kept the OP's backward looping with `bx`, but did it with correct offsets. — Peter Cordes, Dec 10 '22 at 00:25
Also, `mov [si], 8001h` has ambiguous operand-size without `word ptr`. Without the `ARR1` / `ARR2` "variables" to magically imply an operand in MASM/TASM syntax, a good assembler should reject that. But really the array initializers should be in the data section, in the `dw`, not from mov-immediate at all. — Peter Cordes, Dec 10 '22 at 00:27

Aki Suihkonen · Answer 2 · 2022-12-11T13:19:09.947

If implemented serially, the inner loop can be squeezed at least to

again:
    add ax,ax  ; shift ax, while moving MSB to carry
    sbb bx, 0  ; subtract carry (i.e 0 or 1) from bx
               ; here the LSB of bx will be 0 afterwards,
               ; iff the top bit of ax == lsb of bx
               ; in this case the top 15 bits of bx will
               ; be preserved for further iterations
    shr bx, 1  ; now we shift out the LSB, setting CF on failure
    jc not_ok 
    jnz again  ; there are further bits on bx to check
               ;
    test ax,ax ; we need to check that ax == 0, when bx == 0
    jnz not_ok ;
ok:            ; there was full match

I think a separate early exit on AX would not make much sense, but one could sort the inputs, placing BX <= AX, so BX would run out of bits sooner.

Using the parity flag as in how-to-exchange-between-2-bits-in-a-1-byte-number, one can simplify the inner loop greatly:

    ok = (bitreverse(al) == bh) && (bitreverse(bl) == ah)

With the following implementation one does not even need any other registers, than ax = input0, bx = input1.

    test al, 0x81
    jpe 1f          // parity even IFF al == 0x81 OR al == 0
    xor  al, 0x81   // toggle bits 0x80 and 0x01, if there was only 1 bit
1:  test al, 0x42
    jpe 2f
2:  xor  al, 0x42
    test al, 0x24
    jpe 3f
3:  xor  al, 0x24
    test al, 0x18
    jpe 4f
    xor  al, 0x18
4:  cmp  al, bh
    jnz  NOT_OK
    
// and same for the pair bl, ah

Nice idea. After `shr bx,1` makes it zero, you still need to check that AX is also zero, not still having more non-zero bits to shift out. It's a nice optimization of Sep's and my idea of using `adc`/`sbb` on a separate register, but doesn't avoid the need for a `test ax,ax`/`jnz not_ok`. — Peter Cordes, Dec 11 '22 at 13:01

how to check if 2 numbers have reversed bits in assembly?

2 Answers2