Convert 4-bit input into hexadecimal in assembly

Question

I'm trying to get a 4-digit input in TASM 1.4 assembly and convert it into its corresponding hexadecimal value, but when I convert it I'm getting the wrong output, for example entering 1011 I expected that the value returned would be 11 but all I'm getting is '#' or '['

.model small
.stack
.data

CR EQU 10
LF EQU 13
inp1 db CR, LF, CR, LF, "Enter a number: $"
inp2 dw 0
count db 0

.code
    
    tes:
  
    
    cmp count, 1
    je add8
    
    cmp count, 2
    je add4
    
    cmp count, 3
    je add2
    
    cmp count, 4
    je add1
    
    add8:
    add word ptr inp2, 8
    
    add4:
    add word ptr inp2, 4
    
    add2:
    add word ptr inp2, 2
    
    add1:
    add word ptr inp2, 1
    
    jmp conti
    
START:
    MOV AX, @DATA
    MOV DS, AX
    
    mov dx, OFFSET inp1         
    mov ah, 09h
    int 21h
    
     mov bx, 0
    
     start1:
       
        mov ah, 01h                      
        int 21h
        
        cmp al, 0dh                    
        je end1

        inc count
        
        sub ah, 30h
        cmp ah, 1
        je tes
        
        
        conti:                                
        mov ah, 0
        sub al, 30h
        push ax
        mov ax, 10d
        mul bx
        pop bx
        add bx, ax
        jmp start1
   
   
  
  end1:
    mov ax, bx
    mov dx, 0
    add ax, '0'  ; Convert the result to ASCII character
    mov dl, al
    mov ah, 02h
    int 21h
    mov ah, 4ch ; Terminate the program
    int 21h

end START

This is the code that I tried using

Right. You can't just add '0' -- the ASCII characters after 9 are `:;<=>`. You need to test the number. If it is greater than 9, then you need to subtract 10 and add 'A'. Alternatively, you can just use a lookup table: "0123456789ABCDEF". — Tim Roberts, Jun 11 '23 at 01:22
However, you said you expected `1011` to be `11`. The hexadecimal value would be `B`. Do you want the decimal value, or the hexadecimal value? If you're looking for decimal, then you need to investigate the seldom-used DAA instruction. — Tim Roberts, Jun 11 '23 at 01:23
Could you add some comments for us? I don't have the INT 21h commands memorized. I have to look them up in order to help. Maybe name your labels a bit better too? For example, rename start1 to readchar — Watachiaieto, Jun 11 '23 at 01:35
Related: [How to convert a binary integer number to a hex string?](https://stackoverflow.com/q/53823756) (or [Converting bin to hex in assembly](https://stackoverflow.com/q/40818086) for 16-bit DOS). For 2-digit decimal output like `11`, see [Displaying numbers with DOS](https://stackoverflow.com/q/45904075) or [Displaying Time in Assembly](https://stackoverflow.com/q/37129794) for the special case of 2-digit numbers. — Peter Cordes, Jun 11 '23 at 02:46
@Watachiaieto I'm still trying to learn assembly, even I don't know much about int 21h or the other things in there sorry about that — Kookaburra, Jun 11 '23 at 10:00

Sep Roland · Answer 1 · 2023-06-11T22:10:08.673

This is a list of shortcomings in the program and a quick way to improve on them:

CR EQU 10
LF EQU 13

It's the other way round! Carriage return has ASCII code 13, and linefeed has ASCII code 10.

add8:
add word ptr inp2, 8

add4:
add word ptr inp2, 4

add2:
add word ptr inp2, 2

add1:
add word ptr inp2, 1

This currently has a 'fall-through' problem, because once a certain addition is made, the additions that may follow also get executed, producing a wrong result. You can solve it by inserting jumps to the conti label, or by using the fall-through in the below frivolous manner:

add8:
  add  word ptr inp2, 4   ; 4 + 2 + 1 + 1 = 8
add4:
  add  word ptr inp2, 2   ; 2 + 1 + 1 = 4
add2:
  add  word ptr inp2, 1   ; 1 + 1 = 2
add1:
  add  word ptr inp2, 1
  jmp  conti

A chain of compare/branch/add(s) is a terrible way to set a bit in a vector. The bonus program that I included at the bottom does this much simpler. To learn about other methods to set individual bits, consult the Q/A's from the below list.

sub ah, 30h
cmp ah, 1
je tes

The character that DOS gave you is in the AL register. The AH register still contains the function number 01h. As a result the code will never jump to the tes label. What you need is:

cmp  al, '1'     ; Let us assume the user only uses '0' and '1'
je   tes

conti:                                
  mov ah, 0
  sub al, 30h
  push ax
  mov ax, 10d
  mul bx
  pop bx
  add bx, ax

If you would use the byte-sized multiplication, this would become a lot simpler:

conti:
  mov  ah, 10
  sub  al, '0'
  mul  ah
  add  bx, ax

end1:
  mov ax, bx
  mov dx, 0
  add ax, '0'  ; Convert the result to ASCII character
  mov dl, al
  mov ah, 02h
  int 21h

Because BX could be holding a value greater than 9, a conversion that adds 48 is not enough. Since you are interested in an hexadecimal output, and the number is in BX already (which is an address register on 8086), all is setup to use a lookup table:

end1:
  mov  dl, [hextab + bx]
  mov  ah, 02h
  int  21h

In the .data section you add next line:

hextab db "0123456789ABCDEF"

Although the above corrections will not magically turn this program in an efficient solution to the task, it is true that sometimes you can learn a lot from seeing where you went wrong in your actual attempt.
Peter Cordes has provided links to a number of related Q/A's. Be sure to read those Q/A's if you want to become any good at assembly programming...

.model small
.stack
.data

msg  db  13, 10, 10, "Enter a number: $"
hex  db  "0123456789ABCDEF"

.code

START:
    mov  ax, @DATA
    mov  ds, ax
    
    mov  dx, OFFSET msg
    mov  ah, 09h
    int  21h
    
    xor  bx, bx    ; Decimal value of the binary input
    mov  cx, 8     ; Weight of the most significant digit (d3)
  More:
    mov  ah, 01h                      
    int  21h       ; -> AL
    cmp  al, 13
    je   Done      ; Early out, nice to not have to input trailing zeroes!
    sub  al, '0'
    cmp  al, 1
    ja   More      ; Redo for invalid input
    jb   IsZero
  IsOne:
    add  bx, cx
  IsZero:      
    shr  cx, 1     ; 8 -> 4 -> 2 -> 1 -> 0
    jnz  More      ; Max 4 digits
  Done:
    mov  dl, [hex + bx]
    mov  ah, 02h
    int  21h
    mov  ax, 4C00h ; Terminate the program
    int  21h
end START

Nice edit, that's still pretty close to the OP's method (adding a decreasing power of 2), and strength reduction from `1<>= 1` is a good technique, especially using it as the loop counter. — Peter Cordes, Jun 11 '23 at 22:56
I was curious if I could save some instructions in the loop. I posted an answer: yes, especially by simplifying the end-of-input logic. — Peter Cordes, Jun 12 '23 at 00:37

Peter Cordes · Answer 2 · 2023-06-18T21:38:02.487

Sep's answer explains several problems in the attempt in the question.

I thought it might be fun to show a variation on the simple and efficient way of getting input 1 digit at a time, using the total = total*base + digit method of accumulating a result starting with the most-significant digit, to see how that compares for simplicity and efficiency.

Also, I had a look at doing 2 or 4 bits at a time with shifts or rotates, or a multiply bithack. Or even 16 bits at a time with SSE2. See later sections of the answer for that, and discussion of variations and micro-optimizations like ending with a not so we can build the bit-string inverted with adc inside the loop.

This based on Sep's "bonus" version with the loop body rewritten. I simplified the loop condition to exit on any invalid character including newline, instead of ignoring and continuing to loop on anything except newline. I also removed any loop counter; the user can enter more digits if they want before pressing enter, we truncate to the last 4.

.model small
.stack
.data

msg  db  13, 10, 10, "Enter a base-2 number up to 4 bits long: $"
hex  db  "0123456789ABCDEF"

.code

START:
    mov  ax, @DATA
    mov  ds, ax

    mov  dx, OFFSET msg
    mov  ah, 09h
    int  21h

    xor  bx, bx     ; Integer value of the binary string input
    mov  ax, 0100h  ; zero AL, AH=01h = DOS char input
  More:
   ; append the previous bit / digit (0 on the first iteration instead of jumping over this)
    shl  bl, 1     ; bx =  (bx<<1) + CF  shift in the new digit
    or   bl, al
   ; get a new char and check it for being a base-2 digit. AH still set.
    int  21h       ; digit -> AL
    sub  al, '0'
    cmp  al, 1
    jbe  More      ; Exit on any invalid input, including newline
  Done:
    and  bx, 0Fh   ; truncate aka mod 16 in case the user entered a larger number
    mov  dl, [hex + bx]
    mov  ah, 02h
    int  21h
    mov  ax, 4C00h ; Terminate the program
    int  21h
end START

Efficiency of the loop is mostly irrelevant since we're calling a slow int 21h I/O function, but it would be interesting if we were reading from an array with a similar loop. And small machine code size in bytes is always nice, especially for retro-computing or for I-cache footprint.

The loop condition is a range-check for the input being '0'-'1'. So when we loop back to the top, FLAGS are set according to cmp al, 1 where AL was 0 (ZF=0, CF=1) or 1 (ZF=1, CF=0). Input bytes outside that range result in (ZF=0, CF=0), so jbe wasn't taken.

Since the range only contains 2 integers, making the top one special by using cmp al, 1 / jbe instead of cmp al, 2 / jb (aka jc) does leave us with CF set opposite to the bit value we want to append to the total we're accumulating (CF=!AL), as well as excluding all values outside the range.

As Sep points out in comments, cmc / adc bx,bx would be one way to use that CF value instead of the AL value with shl/or. That could save 1 byte of code size (or 2 since we could mov ah, 1 instead of mov ax, 0100h.)

But our current code falls into the loop with CF=0 from xor bx,bx, so the total = total*base + digit code would run with digit=!CF=1. We'd either need to enter the loop with a jmp to the code that gets a new character (skipping the cmp/adc), or we'd need an extra stc before falling into the loop, or partially peel the first iteration, reading an input character and branching to either skip or fall into the loop.

In the current loop, where we use AL for the digit instead of CF, running the total = total*base + digit code with digit=0 has no effect. (And we were able to zero AL for 1 extra byte of code size but no extra instructions, as part of setting AH=01h.) For code-golf (minimum code-size regardless of efficiency), we might use stc outside the loop and cmc/adc bx,bx for a net saving of 1 byte of code size, at a cost of one more instruction outside the loop, and using adc which is slower on Intel (3 uops: https://agner.org/optimize/) from P6 until Broadwell.

The bit-value we want to append is also present non-inverted in ZF, but only CF has special instructions to add it (adc/sbb) or shift it (rcl/rcr) into registers. With 386 setz al, we could re-materialize ZF into a register, but that would be silly because we already have the 0 or 1 digit value in AL.

Other fun things you can do with CF include sbb reg, -1 to add !CF (either reg -= -1 + 1 or reg -= -1 + 0), but that's not useful here without first shifting bx, and it's inconvenient to do that without modifying FLAGS. Also inconvenient to do it before the loop condition which sets FLAGS.

Setting CF according to AL != 0 can be done with add al, 255 (2 bytes), but add al, 255 / adc bx,bx isn't better than a simple shl bl,1 / or bl, al in our case.

We could pull the inversion out of the loop by starting with BX=-1, and doing a not afterwards, before the and bx, 0Fh. This would allow just adc bx,bx at the top to append CF to BX, building up an inverted bit-pattern which we flip at the end. Starting with BX=all-ones (0xFFFF = 2's complement -1) is the same as starting with BX=0 and inserting non-inverted bits. The first iteration would still need to enter with CF=1 or skip the bx = (bx<<1)+CF part, e.g. from using stc.

If we had a number zero-extended into AX (easy if we were looping over an array), we could use 386 instructions like lea bx, [ebx*2 + eax] since 32-bit addressing modes have a 2-bit shift count for the index, and not as restricted in which registers can be used. We could also make the operand-size 32-bit as well, to handle inputs up to 32 bits long.

Performance, and partial registers on semi-modern CPUs like P6

Putting the conditional branch at the bottom of the loop means you only need one branch total, which is generally better, both for code size, and performance although that's mostly irrelevant for a loop with I/O. Ways to handle the first iteration include peeling an initial input and check (doing the first char input before entering the loop), or a jmp to the input+condition at the bottom, or what I'm doing here, arranging the initial state so we can just "fall into" the loop and add a zero.

Inside the loop I could have shifted shl bx, 1 instead of just bl, which would allow the same loop to work for inputs of up to 16-bit integers without truncating them. But read+writing a 16-bit register after writing its low 8 bits with or bl, al would create a partial-register stall on older Intel CPUs. But or bx, ax would also create a partial-register stall reading AX, and more importantly would OR a non-zero high bit from AH=1.

(We're probably creating a partial-register stall anyway by writing BL, assuming the int 21h handler pushes and pops BX at some point, thus writing the full register. If not for that, the xor bx,bx xor-zeroing ahead of the loop will let P6-family (PPro to Nehalem) CPUs know that BX=BL so they don't stall when reading the full BX later. But any write of the full BX breaks that upper-bits-known-zero internal state. Only P6-family renames BL separately from BX, so other microarchitectures don't have this penalty. First-gen Sandybridge can still rename BL separately from EBX, but not BX from EBX so xor bx,bx isn't a zeroing idiom there.)

cmc or add al, 255 (to set CF = AL!=0) / adc bx,bx would work without partial-register stalls, but adc reg,reg is 2 uops on Intel before Broadwell. (So is rcl bx, 1; 3 uops even on later CPUs like Skylake according to https://uops.info/, although only 1 on AMD Zen.)

Due to partial-register considerations for P6-family, if I wanted to accumulate 16-bit or maybe 32-bit values, I might use cmc/adc bx,bx, or add al,255/adc bx,bx, if I cared about those CPUs not just 8086.

CMC was fast on original 8086 (timing tables for 8088 through 586 which don't consider code-fetch bottlenecks), and is still efficient on modern x86, single uop with 1 cycle latency. Using it instead of AL directly makes the latency chain from AL getting an ASCII digit longer by 1 cycle (sub/cmp/cmc/adc vs. sub/add), but that's not part of the loop-carried dependency chain through BX. In fact, on CPUs with efficient adc, that's better because adc bx,bx is 1 cycle latency from the old BX to the new BX. vs. shl bl,1/or bl,al being a chain of two operations.

Loading 2 bytes -> 2 bits from a string, or a 4-byte multiply bithack

Consider a case where we had a string in memory, and we know the array/string length ahead of time, so we don't have to check each digit separately for out-of-range.

We could load 2 bytes at a time and shuffle the bits together. With binary digits in printing order in a string (most-significant first, at lowest address), and x86 being little-endian, a 2-byte load will have the bits in the opposite order of what we want. So we actually need to shift or rotate the low bit of the top half into the bottom half.

Sep suggested in comments ror ah, 1 / rol ax, 1 to first get the bit we want from the 2nd byte (AH) to the top of the register, then rotate it around next to the bit at the bottom of AL. That's not ideal on modern Intel CPUs (Sandybridge-family) where rotate-by-1 costs 2 uops (https://uops.info/) because of how it has to update FLAGS leaving some unmodified, unlike rotate by immediate. And it will have partial-register stalls on P6-family (PPro through Nehalem).

8086 compatible code that runs ok on all CPUs, although not perfect for some of them. rol by 1 is single-uop on P6-family and adc is 2, but they have partial-register stalls for multiple cycles when reading a wider register after writing a narrower part of it. For P5-pentium (in-order dual issue), this could be scheduled to allow pairing of shr bh,1 with mov ax, [si+2] at least.

 mov   bx, [si]
          ;ror bh, 1  ;rol bx, 1 ; would be 4 uops on some CPUs
 shr   bh, 1         ; shift the bit we want into CF
 adc   bl, bl        ; shift it into BL.  2 uops on some CPUs, but not terrible.

 mov   ax, [si+2]     ; shift the next 2 bits into BL separately
 shr   ax, 1          ; first the low-address bit (also moving the last bit into AL)
 adc   bl, bl
 add   al, al         ; then shift the highest bit out the top of AL
 adc   bl, bl
 and   bx, 0x0f      ; P6 partial-register stall when reading BX after writing BL
                     ; P6 would prefer and al, 0xf / movzx bx, al  or something.

 mov dl,[Hex+bx]      ; lookup table, either ASCII digits or 7-seg codes

For throughput and to avoid writing AH (not a big deal unless something later reads AX, but could use another physical register on Sandybridge-family), I shifted the whole AX and used an add instruction instead of yet another shift. (adc and shr compete for ports 0 and 6 on Haswell.) The "simpler" version would be shr al, 1 / ... / shr ah, 1 which in theory has instruction-level parallelism, and does in practice on P6-family which renames AL separately from the full EAX.

Many alternatives are possible especially with 386 instructions, for example prepare BX, then ror ah, 1 / ror ax, 1 / shld bx, ax, 2 to shift 2 bits from the top of AX into the bottom of BX. But SHLD is slow on modern AMD (Zen). I couldn't find a way to use rotates without multiple partial-register stalls on P6-family, so adc seemed the least bad. The best choice for any actual use-case would depend on which CPUs you care about. Instead of shr bh,1, bt bx, 8 would get the bit into CF without creating partial-register problems in EBX. Fully efficient on Intel CPUs like Core 2, 2 uops on AMD Zen.

If we wanted to use xlat, we wouldn't need to extend to 16 bits, maybe avoiding partial-register stalls on P6. Or with 386 instructions, get two separate 2-bit values, mask high garbage, and combine with lea bx, [ebx + eax*4]

To get the other order, ror al, 1 / shr ax, 7 is possible, but has a partial-register stall on Intel P6-family. (386 and later having a barrel shifter for efficient shifts by more than 1.)

With a fast multiply ALU (like P6 and later), 4 bits at once are even possible

  mov   eax, [si]
  and   eax, 0x01010101        ; saves 1 byte vs. loading into EBX
  imul  ebx, eax, 0x08040201   ; top byte = (low byte * 8) + (byte#1 * 4) + ...
  shr   ebx, 24             ; result = low 4 bits of EBX

  movzx edx, byte ptr [Hex+bx]   ; lookup if desired
    ; modern CPUs prefer writing a full reg, not merging a new DL into EDX

See How to create a byte out of 8 bool values (and vice versa)? - recall that multiply adds shifted copies of the input, and if those adds don't carry-out across bitfields in your input, you can use it as a multi-shift and combine or add. We're reversing the multiplier here, with 0x08 in the most-significant byte to put the bit form the low byte to the top of the high nibble, and so on. The bits that end up in the top byte of the multiply are the ones whose place-value adds up to 24, so high x low and low x high bytes, and the two middle bytes.

We can pack bits into an arbitrary order this way, no need for bswap eax ahead of time or anything, just like we avoided ror ax, 8 2-byte swaps with the 16-bit strategy. For MMX or SSE2 SIMD (see below), we do want something like pshufb to byte-reverse, though.

Converting bin to hex in assembly shows a counted loop using shr al,1 to set CF = low bit of AL, i.e. shift the low bit out of AL into CF. And rcl bx,1 to shift it in to BX. It works for numbers more than 4 bits wide since its integer to hex function uses a loop over 4-bit groups, generating multiple hex digits if needed.
How to convert a binary integer number to a hex string? (binary as in register value, doesn't mention binary strings. It has 32-bit scalar code, and SSE2/AVX/AVX-512 ways to turn 4-bit groups into hex digits in parallel.)
Displaying numbers with DOS - for base 10 output of 16 or 32-bit integers, potentially multiple digits.
Displaying Time in Assembly - for 2-digit base 10, like 00 to 99, so it'll work for inputs like 1111 (F in hex, 15 in decimal).

To do this efficiently for longer binary integers, use SSE2 SIMD: Does the x86 architecture support packing bools as bits to parallelize logic operations? / Extract the low bit of each bool byte in a __m128i? bool array to packed bitmap - load, then pslld xmm0, 7 to shift the low bits to the top of bytes, then SSE2 pmovmskb to pack a bit from every byte for multiple bytes at once. The low bit of '0' is 0, and the low bit of '1' is 1.

To check for invalid characters with SIMD, see the range-check part of Convert a String In C++ To Upper Case , but for '0' - '1' instead of 'a'-'z'). To handle digits past the first invalid character, probably pmovmskb eax, xmm0 on the range-check result, and blsmsk. Or since VEX-coded instructions like BMI1 blsmsk aren't available in real mode, use the equivalent bithack of mask ^= mask-1 and AND with that to zero higher bits in packed low bits (i.e. the binary integer). (But that won't zero the bit corresponding to the first non-ASCII digit itself, so I guess right shift the mask by 1, or use a bithack that gives a mask up to but not including the lowest set bit.)

Also pmovmskb will put the lowest-address digit as the lowest byte. But it's the first = highest place-value, so we need to reverse the vector before pmovmskb, probably with SSSE3 pshufb. If you only had a 4-bit integer, mov eax, [si] / bswap eax / movd xmm0, eax would also work.

Once you have a 16-bit integer from pmovmskb (or 32-bit from combining results with shift/OR), you can use more SIMD code to make a hex string representing it, as shown in How to convert a binary integer number to a hex string?

Would replacing `More:` `add al, 255` `adc bx, bx` by the 1 byte shorter (in the loop) `stc` `More:` `cmc` `adc bx, bx` ever be any good? I seem to remember @CodyGray opposing to my using of `cmc` in some CodeReview Q/A. Would choosing `cmc` be fine in 8086/8088 where we apparently should look at codesize rather then counting clocks. — Sep Roland, Jun 13 '23 at 14:13
@SepRoland Your edit is wrong, filling just `ax` is sufficient to use `lea bx` with an `a32` SIB calculation that involves `eax`. Because the destination is `bx` the high 16 bits are always discarded, and the low 16 bits of the result only depend on the low 16 bits of `ax` so it doesn't matter whether `eaxh` is zero or not. — ecm, Jun 13 '23 at 14:21
@ecm I must have missed the `lea` part here. I'll revert the edit, although I do think that seeing AX vs EAX used could confuse the intended audience of these answers. — Sep Roland, Jun 13 '23 at 14:32
@SepRoland: `cmp al,1` doesn't set CF according to whether the bit was 0 or 1, as my answer explained. The bit we want is in ZF, not CF, so we'd need to get the bit into CF somehow, and we can do that non-inverted with `add`. Re: the LEA, yeah maybe I should have put that part lower down, since it's somewhat non-obvious. In 64-bit mode it's fully normal to do `lea eax, [rdi+rsi]` for adding 32-bit integers since that uses the default address-size and operand-size. A wider address-size is something I'm used to seeing, and I'm used to the high garbage not affecting the operand-size output. — Peter Cordes, Jun 13 '23 at 19:20
@PeterCordes After `cmp al,1` if AL=0 CF=1 ZF=0, and if AL=1 CF=0 ZF=1. Anything else exits. So the CF could be inverted to get used in the `adc bx, bx`. No? My only concern was whether using `cmc` would be bad even on 8086/8088. — Sep Roland, Jun 13 '23 at 19:28
@SepRoland: Oh, yes, silly me. >.< Thanks. We're doing `cmp al,1` / `jbe`, not `cmp al,2` / `jb`. Since there are only 2 values in the range, yes, CF is set opposite of the bit we want. We could add it with `sbb bx, -1`, but can't also shift it in with a single instruction. So yes, `cmc`/`adc bx,bx` would be smaller. sub/cmp/cmc/adc has higher latency than sub/add/adc (with an independent cmp), but that dependency chain isn't loop carried so it's fine. Modern CPUs rename CF separately from other FLAGS; Pentium 4 might not like `cmc` for the same reason as `inc`, but P4 is history. — Peter Cordes, Jun 13 '23 at 19:37
@SepRoland: according to https://www2.math.uni-wuppertal.de/~fpf/Uebungen/GdR-SS02/opcode_i.html (which doesn't count code-fetch bottlenecks), `cmc` is fast on 8088, just 2 cycles, vs. 4 for `add acc, imm`. `cmc` is 2 cycles all the way to 586 (and non-pairable), but `add al,imm` is 1 cycle on 486 and Pentium (U or V pipes). On P6, `cmc` is fully efficient, single uop, although on early P6 limited to one execution unit instead of any integer ALU for add. (https://agner.org/optimize/) — Peter Cordes, Jun 13 '23 at 19:42
@ecm: Re: your earlier edit: "any of the 8 GP integer regs" is a simplification but not technically wrong. The only limitation is that ESP can't be an index. Any other register can be a base or index including EBP. There are some encoding limitations the require a SIB or disp8=0 to encode some combinations ([rbp not allowed as SIB base?](https://stackoverflow.com/q/52522544)), but assemblers take care of that for you. I intentionally avoided saying any of the 8 regs can be used in any way you want. I'll compact that wording again when I rewrite to fix the brain-fart Sep pointed out. — Peter Cordes, Jun 13 '23 at 20:12
@SepRoland: The other problem with CMC is that it makes loop entry more complicated. You'd have to enter with CF set, or more likely `jmp` to the loop condition instead of running `cmc`/`adc`. Also I forgot to mention, `adc` is 2 uops on Intel from P6 until Broadwell, so `shl`/`or` has an advantage there. I'd still use `adc` in cases where it had a true advantage since Haswell in 2013 is about 10 years old now, but when there's an alternative with equal uop count on modern CPUs, and near equal code size, I'd slightly favour avoiding `adc` and `cmov` as a tie-breaker for all-else-equal. — Peter Cordes, Jun 14 '23 at 07:42
@SepRoland: Updated to rewrite the sections about using CF. `cmc`/`adc bx,bx` does avoid partial-register stalls which would be a problem if you wanted 16-bit totals, even if you were getting input from an array rather than an `int 21h` that probably saves/restores BX at some point and loses the special state that `xor bx,bx` puts it in. — Peter Cordes, Jun 14 '23 at 21:27
@SepRoland: Added a paragraph with some bit-manipulation tricks like a multiply for 4 bits at a time for CPUs with fast multiplier, such as your Core Duo. But not P5 Pentium or earlier. — Peter Cordes, Jun 15 '23 at 06:41
"... for the index and **as** restricted in which registers can be used." Is this a typo for something else? The fun tricks that process 2 or 4 bits together would need to consider that the most significant binary digit (character) is at the lowest memory address. So for 4 bits at once, we'd need: `mov ebx,[si] : bswap ebx : and ebx,01010101h : imul ebx,01020408h : shr ebx, 24 : mov dl,[Hex+bx]`. — Sep Roland, Jun 18 '23 at 15:44
And for 2 bits together, we'd need: `mov bx,[si] : and bx,0101h : ror bh,1 : rol bx,1 : shl bx,1 : shl bx,1 : mov ax,[si+2] : and ax,0101h : ror ah,1 : rol ax,1 : or bx,ax : mov dl,[Hex+bx]`. I have tested these codes on the 4-byte input from using the DOS.BufferedInput function 0Ah. BTW, I very much like the idea of pulling the inversion out of the loop. — Sep Roland, Jun 18 '23 at 15:44
@SepRoland: Well spotted about the byte order and the typo, thanks. Will edit more when I have time later. For 16-bit chunks on a 386, we could also start by swapping bytes, like `rol bx, 8`, but well done to get the job done with only 1-bit shifts in just as few instructions. The masking could be done at the end right before the `or`, like `and al, 0x3` to save a byte of code-size. — Peter Cordes, Jun 18 '23 at 17:33
@SepRoland: Updated with a 2+2-bit strategy that uses fewer instructions than yours. (9 instructions including the loads, but not the hex lookup). Yours is 11, or 10 with 186 `shl bx, 2` which is faster than two 1-bit shifts on later CPUs like 386. Yours might have better instruction-level parallelism, in case latency matters more than throughput, since in mind there's a chain of `adc bl,bl`. I don't see a good way to use rotates without multiple partial-reg stalls on P6-family, so `adc` seems least bad on the most CPUs. — Peter Cordes, Jun 18 '23 at 20:51
From the second load it is AL that has the most weight and that needs shifting twice (once through the carry and then register-internal):`mov ax, [si+2] : shr AL, 1 : adc bl, bl : shr AH, 1 : adc bl, bl` instead of `mov ax, [si+2] : shr ah, 1 : adc bl, bl : shr al, 1 : adc bl, bl`. — Sep Roland, Jun 18 '23 at 21:09
@SepRoland: Thanks, fixed. I guess I got distracted with performance stuff and forgot once again to use the right registers! — Peter Cordes, Jun 18 '23 at 21:39

Watachiaieto · Answer 3 · 2023-06-11T02:19:06.107

Lots of small issues.

Your character input is in al, but you were subtracting 30h from ah instead
you were falling through in your tes label - so if you added 8, you also added 4, 2, and 1. (And adding 4 adds 2 and 1), and (adding 2 also adds 1...)
You are reading your inputs left to right (msb to lsb) but adding them into your number lsb to msb. (E.G. 1000 will become 1, and 0001 will become 8) - I looked again and this point is wrong.

Here is my solution. I have not tested it at all, so there might still be issues.

    .model small
    .stack
    .data

    CR EQU 10
    LF EQU 13
    inp1 db CR, LF, CR, LF, "Enter a number: $"

    .code

    START:
        ;Prompt the user for a number
        MOV AX, @DATA
        MOV DS, AX
        
        mov dx, OFFSET inp1
        mov ah, 09h
        int 21h
        
        mov bx, 0
        mov cx, 0
        
    read:
        ;read input character into al
        mov ah, 01h
        int 21h
        
        ;quit if we found the null terminator
        cmp al, 0dh
        je end1

        ;increase our jump counter
        inc cx
        
        ;convert the input from a character to an integer
        sub al, 30h
        cmp ah, 1
        
        ;If the input was 0, no need to increment so read the next character
        jne read
        
        cmp cx, 1
        je add8

        cmp cx, 2
        je add4

        cmp cx, 3
        je add2

        cmp cx, 4
        je add1

        add8:
        add bx, 8
        jmp read
        
        add4:
        add bx, 4
        jmp read
        
        add2:
        add bx, 2
        jmp read
        
        add1:
        add bx, 1
        jmp read

    end1:
        ; put our converted number into ax and turn it to a character
        mov ax, bx
        add ax, '0'
        
        ;print out our number
        mov dx, 0
        mov dl, al
        mov ah, 02h
        int 21h
        
        ;Terminate the program
        mov ah, 4ch
        int 21h

    end START

EDIT: Some ways to further improve this... if we go aroudn the loop, shift bx to the left, take our input and AND it with 1. Then or the value into bx.

    .model small
    .stack
    .data

    CR EQU 10
    LF EQU 13
    inp1 db CR, LF, CR, LF, "Enter a number: $"

    .code

    START:
        ;Prompt the user for a number
        MOV AX, @DATA
        MOV DS, AX

        mov dx, OFFSET inp1
        mov ah, 09h
        int 21h

        mov bx, 0

    read:
        ;read input character into al
        mov ah, 01h
        int 21h

        ;quit if we found the null terminator
        cmp al, 0dh
        je end1

        ;convert the input from a character to an integer
        sub al, 30h

        ;clear all the bits but the LSB (undefined behaviour if the user entered anything but 0 or 1)
        and al, 1

        ;shift left, and OR in the new lsb
        shl bx, 1
        or bx, ax
        
        ;read in our next character
        jump read

    end1:
        ; put our converted number into ax and turn it to a character
        mov ax, bx
        add ax, '0'

        ;print out our number
        mov dx, 0
        mov dl, al
        mov ah, 02h
        int 21h

        ;Terminate the program
        mov ah, 4ch
        int 21h

    end START

Have you tried your program for inputs like 1111? Adding `'0`` doesn't produce hex digits `'A'` to `'Z'`. That is a big improvement to how you're reading base-2 string input, though. — Peter Cordes, Jun 11 '23 at 06:06
I gave the OP his code back with those 2 mistakes fixed, and since the OP did a good job, I hope I was encouraging and showed how helpful comments and nice labels can be! I then made it loopable - Which I think you missed because you had not refreshed. That leaves two bugs to fix ;) 1. The number is not hex after 9 (because the needed ascii offset) 2. The user should align the number to 4 bits for infinite length , or put a loop limiter in to prevent overflow of bx. Or the mult-inverse ;) https://stackoverflow.com/questions/65432063/6502-assembly-binary-to-bcd-is-that-possible-on-x86 — Watachiaieto, Jun 11 '23 at 07:02
You're not doing future readers any favours by posting code where you know you haven't fixed all the bugs, especially if you don't even *mention* them in the text of your answer, let alone in comments. If you don't want to answer that part, at least link an existing Q&A that shows how to do it correctly, e.g. [this](https://stackoverflow.com/q/40818086) which includes the integer to hex part. If the question has too many separate bugs to answer them all, then it wasn't a [mcve] of any individual problem. Not all questions should be answered, but in this case the input stuff does go together — Peter Cordes, Jun 11 '23 at 07:09
I used the @Watachiaieto it works well, made it super easy to understand with comments, thanks. The only problem now is that when the input is 1111 or 1011 anything that starts with 1 doesn't get converted properly and is just showing special characters. — Kookaburra, Jun 11 '23 at 10:04
"I looked again and this point is wrong." Indeed the 3rd point is moot. You 'd better remove it entirely from the answer. It is all very confusing now! — Sep Roland, Jun 11 '23 at 14:55
@Kookaburra It is not because it begins with 1, it is because converting numbers greater than 9 to hex doesn't work correctly. You need to make one more edit so that it does that. It is what peter has been hinting at, and one of the remaining two issues. I am leaving that to you as an excersize to figure out. (Hint: look at an ascii table and you will see why!) — Watachiaieto, Jun 11 '23 at 16:18

Convert 4-bit input into hexadecimal in assembly

3 Answers3

Performance, and partial registers on semi-modern CPUs like P6

Loading 2 bytes -> 2 bits from a string, or a 4-byte multiply bithack

Related: