1

I'm quite new to assembly
Consider the following function:
enter image description here

where the '+' stands for 'Or' logic gate and the concatenation of variables, stands for 'And' logic gate.
How can I implement such a function in emu8086? given that the input parameters may stand for the bits of a register AL, for example, the output would be changing its value into 0 or 1.

UPDATE: here's what I did, I know it's bad written and it does seem work if there is any suggestion or an easier way let me know
thanks all for your help especially peter.

org 100h

mov al, 0A3h     pour le test              

mov ah, al
and ah, 01h        ;ah = x0 
shr al, 1

mov bl, al 
and bl, 01h        ;bl = x1 
shr al, 1

mov bh, al
and bh, 01h            
not bh
and bh, 01h         ;bh = !x2
shr al, 1

mov cl, al 
and cl, 01h        
not cl
and cl, 01h        ;cl = !x3
shr al, 1

mov ch, al
and ch, 01h        
not ch
and ch, 01h        ;ch = !x4
shr al, 1

mov dl, al
and dl, 01h        ;x5 = dl
shr al, 1

mov dh, al
and dh, 01h        
not dh
and dh, 01h        ;dh = !x6    

shr al, 1          ;al = x7  


and bh, dl
and bh, cl
and bh, ah         ;!x2 and x5 and !x3 and x0     

and dh, bl 
and dh, ch
and dh, al         ;!x6 and x1 and !x4 and x7 

or dh, bh   

mov ah, dh         ;resultat dans ah







ret
ahmed ben
  • 145
  • 1
  • 11
  • 1
    x86, like pretty much every CPU acrhitecture, has instructions for bitwise operation, such as `and`, `or` and `not`. – Michael Apr 17 '20 at 18:01
  • yes, but how can I do it for a given register bit by bit, it's for the 8086 – ahmed ben Apr 17 '20 at 18:05
  • 1
    There are also shift and rotation instructions, so you can move bits around into their desired positions. Consult the [x86 instruction set reference](https://software.intel.com/en-us/articles/intel-sdm#nine-volume). – Michael Apr 17 '20 at 18:25
  • 2
    So `x7 .. x0` are the bits of one register? Do the two terms on the right side represent bit-concatenation into 4-bit integers? If so, see [Bit delivery in Bin City](https://codegolf.stackexchange.com/a/203610) for an example of using right shift and `rcl` or `rcr` to rotate a bit back into another register. Or is that a boolean product (aka AND) to produce a 0 or 1 boolean result? And boolean `+` is the same as `^` xor because a 1-bit value has nothing to carry-out into. Can you show an example input and what exact value you want in a register as a result? And ideally how the math works – Peter Cordes Apr 17 '20 at 18:40
  • @PeterCordes No, I mean it is like (inverse(x2) AND x5 AND inverse(x3) AND x0) OR (inverse(x6) and x1 and inverse(x4) and x7) – ahmed ben Apr 17 '20 at 19:02
  • 2
    (x & 2d) == 21 | | (x & d2) == 82 – prl Apr 17 '20 at 19:08
  • 1
    Ok, you should [edit] that into the question so it's clear to everyone. You can copy AL to another register, NOT it, shift or rotate it so the desired bits line up, then `and reg, al` to get some of the partial results you want in some bits of that reg. You might want to `test reg, 0b00100101` or something to set ZF if all those bits were zero. Or do another NOT before `test` so ZF is set if any of those bits selected by the mask zero 0. (You can think of ZF in FLAGS as being set based on a horizontal `nor`, so NOT first lets you use it as a horizontal AND of selected bits.) – Peter Cordes Apr 17 '20 at 19:12
  • 1
    Actually nevermind that rotate idea, it was overcomplicated. Your best bet here is to XOR with a bit-pattern that inverts the bits you need flipped, then you can `TEST AL, imm8` with two different bit-masks to select the combination of bits you want. Or copy a register and use AND, then maybe OR them together so OR sets flags. – Peter Cordes Apr 17 '20 at 19:18
  • 1
    re: your update: so normally I'd say don't put an answer in the question, but now you're asking how to do it *efficiently*, and the slow and simple way works to show exactly what result you want and what the math symbols mean. So that's fine. Apparently your `+` is logical OR, not XOR. – Peter Cordes Apr 17 '20 at 20:55
  • @ahmedben: I was also going to comment before you deleted [your last question](https://stackoverflow.com/q/61656966/522444): Also please see: [Why is it frowned upon to use a null layout in Swing?](https://stackoverflow.com/q/6592468/522444) – Hovercraft Full Of Eels May 07 '20 at 11:56

2 Answers2

2

Are you sure that expression with four x_n values next to each other is supposed to be bitwise AND, not concatenating them into 4-bit values? And then a binary add? Because I might have guessed that instead. If so, see https://codegolf.stackexchange.com/a/203610 for a shift and rcl reg, 1 way to split bits up between a pair of registers. Or on a modern x86 with BMI2, you could use 2x pext and add to do that.

The fact that the expressions have the bits in each group in a specific order that isn't just ascending or descending is probably a clue that they want you to unpack the byte into two 4-bit integers and do a normal + with it.


If we assume your asm is an example of the correct function

The rest of this answer is about optimizing the operations in your asm which does two groups of ANDs, and ORs that result together into a single boolean value, producing a 0 or 1 in AL.

There are some improvements you can make to your simplistic / straightforward implementation that just extracts each bit separately. e.g. you don't need to AND before and after you NOT. The first AND is going to leave the high bits all 0, then NOT makes them 1, then the second AND makes them zero again.

mov bh, al
; and bh, 01h            ; This is pointless
not bh
and bh, 01h         ;bh = !x2

You can take this farther: You're purely using bitwise operations and only care about the low bit in each register. You can and al, 1 once at the end to isolate the bit you want, with all the temporaries carrying around garbage in their high bits.


To flip some bits but not all, use XOR with a constant mask. e.g. to flip bits 6,4,3,2 in AL and leave the others unmodified, use xor al, 01011100b1. Then you can shift and mov to separate registers without needing any NOT instructions.

Footnote 1: The trailing b indicates base 2 / binary. That works in MASM syntax, IDK if emu8086 supports it or if you'd have to write the equivalent hex.

And you can AND right into those registers instead of extracting first, so you only need two scratch regs.

   xor   al, 01011100b    ; complement bits 6,4,3,2
   mov   cl, al           ; x0, first bit of the 2&5&3&0 group

   shr   al, 1
   mov   dl, al           ; x1, first bit of the 6&1&4&7 group

   shr   al, 1
   and   cl, al           ; AND X2 into the first group, X2 & x0

   shr   al, 1
   and   cl, al           ; cl = X2 & X3 & x0

   ...                    ; cl = 2&5&3&0,  dl = 6&1&4 with a few more steps

   shr   al, 1            ; AL = x7 
   and   al, dl           ; AL = x6 & x1 & x4 & x7  (reading 6,1,4 from dl)

   or    al, cl           ; logical + apparently is regular (not exclusive) OR
   and   al, 1            ; clear high garbage
   ret

(With plain ASCII comments, I've ignored the "complement" part because we handle it all with one instruction at the start.)


That's as far as I can see us going with a straightforward implementation that just gets the bits to the bottom of a register and does each boolean operation (other than complement) with a separate asm instruction.

To do better, we need to take advantage of the 8 (or 16) bits in a register we can do in parallel with one instruction. We can't easily shuffle bits to make them line up with each other because the pattern is irregular.

IDK if there's anything clever we can do left-shifting AX to get bits from AL into the bottom of AH, as well as grouping some at the top of AL. Hmm, maybe alternate shl ax with rol al to send bits back to the bottom of AL. But that still needs 7 shifts to separate the bits. (shl ax,2 and rol al,2 for the contiguous bits that go together (7,6 and 3,2) are only available on 186, and putting a count in CL is barely worth it).

The more likely angle of attack is FLAGS: most ALU operations update FLAGS according to the result, with ZF being set to 1 if all bits in a result are 0, otherwise to 1. This gives us a horizontal OR operation across bits in one register. Since !(a | b) = !a & !b, we can invert the non-complemented bits in the input to use that as a horizontal AND instead of OR. (I'm using ! for a single bit invert. In C, ! is a logical not that turns any non-zero number into 0, unlike ~ bitwise NOT.)

But unfortunately 8086 doesn't have an easy way to turn ZF into a 0/1 in a register directly. (386 adds setcc r/m8, e.g. setz dl sets DL = 0 or 1 according to ZF.) That is possible for CF. We can get CF set according to a register being non-zero by using sub reg, 1, which sets CF if the reg was 0 (because a borrow comes out the top). Otherwise it clears CF. We can get a 0 / -1 in a reg according to CF with sbb al, al (subtract with borrow). The al-al part cancels, leaving 0 - CF.

To set up for using FLAGS, we can use AND masks to separate the bits into their two groups.

;; UNTESTED, I might have some logic inverted.

   xor   al, 10100011b         ; all bits are the inverse of their state in the original expression.

   mov   dl, al
   and   dl, 11010010b         ; ~x[7,6,4,1]
   and   al, 00101101b         ; ~x[5,3,2,0]

   cmp   dl, 1                 ; set CF if that group was all zero (i.e. if the original AND was 1), else clear
   sbb   dl, dl                ; dl = -1 or 0 for the first group

   cmp   al, 1
   sbb   al, al                ; al = -1 or 0 for the second group.  Fun fact: undocumented SALC does this

   or    al, dl                ; The + in the original expression
   and   al, 1                 ; keep only the low bit

   ret

There's probably even more we could do, like maybe and al, dl to clear the bits in AL or not, according to the SBB result in DL. Or maybe adc al, -1 instead of cmp al, 1 to use the CF result from DL to affect how CF gets set from AL.

Instead of subtracting 1, you could sub dl, 11010010b with the AND mask you used, so you get 0 if they were all set, otherwise it wraps and you set CF. Not sure if that's useful.

The amount of negations / inversions quickly gets tricky to do in your head, but if every byte of code-size or every cycle of performance matters then it's something you should look into. (These days that's rarely the case, and when it is you're often vectorizing with SSE2 or AVX so you wouldn't have flags, just bitwise within vector elements and packed compare that turns a match into all-ones and a non-match into 0.)

Note that after splitting up with mov/AND, neither AL nor DL can be all-ones, so adding 1 can never wrap to zero. So maybe sbb al, -1 could add 0 or 1 and could set ZF?

If you want to branch, branching on ZF is fine with jz or jnz. That might even be best on 8086, e.g. if the first AND group gives a 1, we don't need to isolate the other group. So xor al, ... to complement bits accordingly, then test al, mask1 / jnz check_other_group / mov al,1 would be a good fall through fast path.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    @ahmedben: I was thinking about this some more: it seems likely that x2,x5,x3,x0 would be written in that order for a reason. If it was just an AND between those bits, why not x5,3,2,0 in descending order like a normal person? The fact that it's not simple indicates to me that your interpretation isn't what was intended. That's a separate question from how to simplify your interpretation of what you assumed it meant, but you should double-check with your instructor unless you have some more evidence that it's what you think. – Peter Cordes Apr 17 '20 at 23:59
  • 1
    Assuming this is some kind of assignment, otherwise you'd of course know what you need! And you wouldn't be using emu8086 to write obsolete 16-bit code. – Peter Cordes Apr 18 '20 at 00:00
1

Inspired by the comments of Peter Cordes and prl:

int test(char x)
{
    return ((x & 0x2d) == 0x21) || ((x & 0xd2) == 0x82);
}

Godbolt (x86 msvc v19.24 /Os) generated:

   _x$ = 8                                       ; size = 1
int test(char) PROC                                  ; test
        mov     cl, BYTE PTR _x$[esp-4]
        mov     al, cl
        and     al, 45                                    ; 0000002dH
        cmp     al, 33                                    ; 00000021H
        je      SHORT $LN3@test
        and     cl, 210                             ; 000000d2H
        cmp     cl, 130                             ; 00000082H
        je      SHORT $LN3@test
        xor     eax, eax
        ret     0
$LN3@test:
        mov     eax, 1
        ret     0
int test(char) ENDP                                  ; test
Axel Kemper
  • 10,544
  • 2
  • 31
  • 54
  • That looks like optimization disabled. Hardly a useful suggestion to spill/reload to memory, and 8086 doesn't have `movsx` (which is only there as a missed optimization in the first place; `movzx` is a better choice for a byte reload and the reg-reg `movsx` after that braindead even at `-O0`). Also, ICC is good at auto-vectorizing, but tends to be worse than GCC or clang at saving instructions in logic like this. Anyway yes, https://godbolt.org/z/uAihXD shows ICC -O3 still compile with two branches, GCC9.3 with one plus a `sete al` which 8086 doesn't have. Clang goes branchless with sete. – Peter Cordes Apr 19 '20 at 09:04
  • Anyway, like I said in my answer, inverting the XOR constant would let the compiler use `test al, 0x2D` / `jnz` to check for all bits clear instead of and/cmp/je to check for all bits set in a mask. – Peter Cordes Apr 19 '20 at 09:06
  • You should link the actual code + compiler options on Godbolt; use the "share" dropdown. Also, are you sure that expression is actually right? It optimizes down to only one `x&mask == constant`, not two. That's not what I get: https://godbolt.org/z/p7D_5C. Did you maybe typo one of the constants so that side of the `||` is always fase? – Peter Cordes Apr 19 '20 at 20:28