will work out how far apart two letters are in the alphabet

Question

The program will then work out the difference between the positions of the two letters of the alphabet, and display this to the screen. For example, the difference between “a” and “e” is 4. The difference between “d” and “b” is 2. The difference between “c” and “c” is 0. this is what I did but I don't know how to count the gap between the letters

SECTION .bss
SECTION .data
tMsg: db "fist letter",10
tLen: equ $-tMsg
Input: times 100 db 0
ok: db "ok"
oklen: equ $-ok
yMsg: db "Second letter",10
yLen: equ $-yMsg
SECTION .text 
global _start 

   _start:
   nop
   mov eax,4
   mov ebx,1
   mov ecx,tMsg
   mov edx,tLen
   int 80H

   mov eax,3
   mov ebx,0
   mov ecx,Input
   mov edx,100
   int 80H

   mov ax, [Input]
   mov bx, [ok]
   cmp ax, bx
   je y
   mov eax,1
   mov ebx,0
   int 80H

   y:
   mov eax,4
   mov ebx,1
   mov ecx,yMsg
   mov edx,yLen


   int 80H
   mov eax,1
   mov ebx,0
   int 80H

What you need to do is check if the letter is an uppercase or lowercase letter -- then the distance is simply `char - 'A'`, or in the latter, `char - 'a'`. To check the case you just need two comparisons. A short function with the current char in `ebx`, set the return in `eax`, e.g. `mov eax 1` (true). Then it is a matter of `cmp ebx 'A'` and `jl` to a block that sets `eax` to `0` (false) and returns. Then `cmp ebx, 'Z' and `jg` to the block that sets `eax` false and returns. Otherwise, simply return (true) because the char is uppercase. Now do distance as uppercase. Same for lowercase. — David C. Rankin, Jan 07 '21 at 19:12
@DavidC.Rankin: Or more simply, force both letters to lower-case by ORing in the lower-case bit, `or eax, 0x20` and the same for the other letter. (Optimizing further by subtracting and *then* masking to keep only the low 5 bits doesn't work easily or at all: we need a signed result that can be -25..+25, and 5-bit 2's complement isn't wide enough for that, so even sign-extending the low 5 wouldn't work.) — Peter Cordes, Jan 08 '21 at 03:38
Yes, that ended up being simpler than handling both `:)` Let me know if you have other thoughts there as well. — David C. Rankin, Jan 08 '21 at 03:39
@DavidC.Rankin: Yeah, it's really easy if we can assume without checking that both inputs in fact *are* letters in the first place. If we do need to check, that's still the best starting point: force-lowercase and check only takes one OR / sub / cmp+jcc for each input: [What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa?](https://stackoverflow.com/a/54585515). Converting from ASCII code to index into the alphabet is fully usable for signed subtraction. (also see the edit to my prev. comment re: an optimization idea that wouldn't work out.) — Peter Cordes, Jan 08 '21 at 03:45
Yes, instead of an `islower()` and `isupper()`, you end up writing an `isalpha()` and then checking the case bit. So from an efficiency standpoint, that is still the way to go, but it isn't as simple as the case bit itself. Once you know you have an alpha character then you are in business. — David C. Rankin, Jan 08 '21 at 03:49
@DavidC.Rankin: Well in this case you combine those checks with the `tolower`, It would be silly to actually do isalpha separately from tolower because you get tolower for free as part of isalpha. This applies if you want a case-insensitive difference where 'A' to 'a' is 0, not -26 or -32 or whatever. You literally do not need to look at the case bit at all. But if you did want to to be case-sensitive, then yeah clear and save the case bit with `btc reg,5` / `setc cl` for example, or save the original char before the isalpha. (Then subtract ASCII codes and abs(). If diff>26, subtract 32-26 — Peter Cordes, Jan 08 '21 at 04:00

David C. Rankin · Accepted Answer · 2021-01-08T03:58:02.783

If you are still stuck, you question requires that you determine the distance between two characters. That brings with it a number of checks you must implement. Though your question is silent on whether you need to handle both uppercase and lowercase distances, unless you are converting everything to one case or the other, you will need to determine whether both characters are of the same case to make the distance within the alphabet between those two characters valid.

Since two characters are involved, you need a way of saving the case of the first for comparison with the case of the second. Here, and in all cases where a simple state is needed, just using a byte (flag) to store the state is about as simple as anything else. For example, a byte to hold 0 if the ASCII character is not an alpha character, 1 if the character is uppercase and 2 if the character is lowercase (or whatever consistent scheme you like)

That way, when you are done with the comparisons and tests, you can simply compare the two flags for equality. If they are equal, you can proceed to subtract one from the other to get the distance (swapping if necessary) and then output the number converting the number to ASCII digits for output.

To test if the character is an uppercase character, similar to isupper() in C, a short function is all that is needed:

; check if character isupper()
; parameters:
;   ecx - address holding character
; returns;
;   eax - 0 (false), 1 (true)
_isupr:
    
    mov eax, 1                  ; set return true
    
    cmp byte[ecx], 'A'          ; compare with 'A'
    jge _chkZ                   ; char >= 'A'
    
    mov eax, 0                  ; set return false
    ret
    
  _chkZ:
    cmp byte[ecx], 'Z'          ; compare with 'Z'
    jle _rtnupr                 ; <= is uppercase
    
    mov eax, 0                  ; set return false
    
  _rtnupr:
    ret

You can handle the storage for the local arrays and values you need in a couple of ways. You can either subtract from the current stack pointer to create temporary storage on the stack, or in a slightly more readable way, create labels to storage within the uninitialized segment (.bss) and use the labels as variable names. Your initialized variables go in the .data segment. For example, storage for the program could be:

section .bss

    buf     resb 32             ; general buffer, used by _prnuint32
    bufa    resb 8              ; storage for first letter line
    bufb    resb 8              ; storage for second letter line
    lena    resb 4              ; length of first letter line
    lenb    resb 4              ; length of second letter line
    nch     resb 1              ; number of digit characters in _prnuint32
    ais     resb 1              ; what 1st char is, 0-notalpha, 1-upper, 2-lower
    bis     resb 1              ; same for 2nd char

Rather than using numbers sprinkled through your syscall setups, declaring initialized labels for, e.g. stdin and stdout instead of using 0 and 1 make things more readable:

section .data

    bufsz:  equ 32
    babsz:  equ 8
    tmsg:   db "first letter : "
    tlen:   equ $-tmsg
    ymsg:   db "second letter: "
    ylen:   equ $-ymsg
    dmsg:   db "char distance: "
    dlen:   equ $-dmsg
    emsg:   db "error: not alpha or same case", 0xa
    elen:   equ $-emsg
    nl:     db 0xa
    stdin:  equ 0
    stdout: equ 1
    read:   equ 3
    write:  equ 4
    exit:   equ 1

Then for your reading your character input, you would have, e.g.

    mov     eax, write          ; prompt for 1st letter
    mov     ebx, stdout
    mov     ecx, tmsg
    mov     edx, tlen
    int 80h                     ; __NR_write
    
    mov     eax, read           ; read 1st letter line
    mov     ebx, stdin
    mov     ecx, bufa
    mov     edx, babsz
    int 80h                     ; __NR_read
    
    mov     [lena], eax         ; save no. of character in line

To then check the case of the character input, you could do:

    call    _isupr              ; check if uppercase
    cmp     eax, 1              ; check return 0-false, 1-true
    jne chkalwr                 ; if not, branch to check lowercase
    
    mov     byte[ais], 1        ; set uppercase flag for 1st letter
    
    jmp getb                    ; branch to get 2nd letter
    
  chkalwr:
    call    _islwr              ; check if lowercase
    cmp     eax, 1              ; check return
    jne notalpha                ; 1st letter not alpha char, display error
    
    mov     byte[ais], 2        ; set lowercase flag for 1st char

The notalpha: label just being a block to output an error in case the character isn't an alpha character or the case between the two characters don't match:

  notalpha:                     ; show not alpha or not same case error
    mov     eax, write
    mov     ebx, stdout
    mov     ecx, emsg
    mov     edx, elen
    int 80h                     ; __NR_write
    
    mov     ebx, 1              ; set EXIT_FAILURE

After you have completed input and classification of both characters, you now need to verify whether both character are of the same case, if so you need to compute the distance between the characters (swapping if necessary, or using an absolute value) and finally handle the conversion of the distance between them from a numeric value to ASCII digits for output. You can do something similar to the following:

  chkboth:
    mov     al, byte[ais]       ; load flags into al, bl
    mov     bl, byte[bis]
    cmp     al, bl              ; compare flags equal, else not same case
    jne notalpha
    
    mov     eax, write          ; display distance output
    mov     ebx, stdout
    mov     ecx, dmsg
    mov     edx, dlen
    int 80h                     ; __NR_write
    
    mov     al, byte[bufa]      ; load chars into al, bl
    mov     bl, byte[bufb]
    cmp     al, bl              ; chars equal, zero difference
    
    jns getdiff                 ; 1st char >= 2nd char
    
    push    eax                 ; swap chars
    push    ebx
    pop     eax
    pop     ebx
    
  getdiff:
    sub     eax, ebx            ; subtract 2nd char from 1st char
    call _prnuint32             ; output difference
    
    xor     ebx, ebx            ; set EXIT_SUCCESS
    jmp     done

Putting it altogether and including the _prnuint32 function below for conversion and output of the numeric distance between characters, you would have:

section .bss

    buf     resb 32             ; general buffer, used by _prnuint32
    bufa    resb 8              ; storage for first letter line
    bufb    resb 8              ; storage for second letter line
    lena    resb 4              ; length of first letter line
    lenb    resb 4              ; length of second letter line
    nch     resb 1              ; number of digit characters in _prnuint32
    ais     resb 1              ; what 1st char is, 0-notalpha, 1-upper, 2-lower
    bis     resb 1              ; same for 2nd char

section .data

    bufsz:  equ 32
    babsz:  equ 8
    tmsg:   db "first letter : "
    tlen:   equ $-tmsg
    ymsg:   db "second letter: "
    ylen:   equ $-ymsg
    dmsg:   db "char distance: "
    dlen:   equ $-dmsg
    emsg:   db "error: not alpha or same case", 0xa
    elen:   equ $-emsg
    nl:     db 0xa
    stdin:  equ 0
    stdout: equ 1
    read:   equ 3
    write:  equ 4
    exit:   equ 1

section .text

    global _start:
_start:
    
    mov     byte[ais], 0        ; zero flags
    mov     byte[bis], 0
    
    mov     eax, write          ; prompt for 1st letter
    mov     ebx, stdout
    mov     ecx, tmsg
    mov     edx, tlen
    int 80h                     ; __NR_write
    
    mov     eax, read           ; read 1st letter line
    mov     ebx, stdin
    mov     ecx, bufa
    mov     edx, babsz
    int 80h                     ; __NR_read
    
    mov     [lena], eax         ; save no. of character in line
    
    call    _isupr              ; check if uppercase
    cmp     eax, 1              ; check return 0-false, 1-true
    jne chkalwr                 ; if not, branch to check lowercase
    
    mov     byte[ais], 1        ; set uppercase flag for 1st letter
    
    jmp getb                    ; branch to get 2nd letter
    
  chkalwr:
    call    _islwr              ; check if lowercase
    cmp     eax, 1              ; check return
    jne notalpha                ; 1st letter not alpha char, display error
    
    mov     byte[ais], 2        ; set lowercase flag for 1st char
    
  getb:
    mov     eax, write          ; prompt for 2nd letter
    mov     ebx, stdout
    mov     ecx, ymsg
    mov     edx, ylen
    int 80h                     ; __NR_write
    
    mov     eax, read           ; read 2nd letter line
    mov     ebx, stdin
    mov     ecx, bufb
    mov     edx, babsz
    int 80h                     ; __NR_read
    
    mov     [lenb], eax         ; save no. of character in line
    
    call    _isupr              ; same checks for 2nd character
    cmp     eax, 1
    jne chkblwr
    
    mov     byte[bis], 1
    
    jmp chkboth
    
  chkblwr:
    call    _islwr
    cmp     eax, 1
    jne notalpha
    
    mov     byte[bis], 2
    
  chkboth:
    mov     al, byte[ais]       ; load flags into al, bl
    mov     bl, byte[bis]
    cmp     al, bl              ; compare flags equal, else not same case
    jne notalpha
    
    mov     eax, write          ; display distance output
    mov     ebx, stdout
    mov     ecx, dmsg
    mov     edx, dlen
    int 80h                     ; __NR_write
    
    mov     al, byte[bufa]      ; load chars into al, bl
    mov     bl, byte[bufb]
    cmp     al, bl              ; chars equal, zero difference
    
    jns getdiff                 ; 1st char >= 2nd char
    
    push    eax                 ; swap chars
    push    ebx
    pop     eax
    pop     ebx
    
  getdiff:
    sub     eax, ebx            ; subtract 2nd char from 1st char
    call _prnuint32             ; output difference
    
    xor     ebx, ebx            ; set EXIT_SUCCESS
    jmp     done
    
  notalpha:                     ; show not alpha or not same case error
    mov     eax, write
    mov     ebx, stdout
    mov     ecx, emsg
    mov     edx, elen
    int 80h                     ; __NR_write
    
    mov     ebx, 1              ; set EXIT_FAILURE
    
  done:
    mov     eax, exit           ; __NR_exit
    int 80h
    
; print unsigned 32-bit number to stdout
; arguments:
;   eax - number to output
; returns:
;   none
_prnuint32:
    mov     byte[nch], 0        ; zero nch counter
    
    mov     ecx, 0xa            ; base 10  (and newline)
    lea     esi, [buf + 31]     ; load address of last char in buf
    mov     [esi], cl           ; put newline in buf
    inc     byte[nch]           ; increment char count in buf

_todigit:                       ; do {
    xor     edx, edx            ; zero remainder register
    div     ecx                 ; edx=remainder = low digit = 0..9.  eax/=10
    
    or      edx, '0'            ; convert to ASCII
    dec     esi                 ; backup to next char in buf 
    mov     [esi], dl           ; copy ASCII digit to buf
    inc     byte[nch]           ; increment char count in buf

    test    eax, eax            ; } while (eax);
    jnz     _todigit

    mov     eax, 4              ; __NR_write from /usr/include/asm/unistd_32.h
    mov     ebx, 1              ; fd = STDOUT_FILENO
    mov     ecx, esi            ; copy address in esi to ecx (addr of 1st digit)
                                ;  subtracting to find length.
    mov     dl, byte[nch]       ; length, including the \n
    int     80h                 ; write(1, string,  digits + 1)

    ret

; check if character islower()
; parameters:
;   ecx - address holding character
; returns;
;   eax - 0 (false), 1 (true)
_islwr:
    
    mov eax, 1                  ; set return true
    
    cmp byte[ecx], 'a'          ; compare with 'a'
    jge _chkz                   ; char >= 'a'
    
    mov eax, 0                  ; set return false
    ret
    
  _chkz:
    cmp byte[ecx], 'z'          ; compare with 'z'
    jle _rtnlwr                 ; <= is lowercase
    
    mov eax, 0                  ; set return false
    
  _rtnlwr:
    ret


; check if character isupper()
; parameters:
;   ecx - address holding character
; returns;
;   eax - 0 (false), 1 (true)
_isupr:
    
    mov eax, 1                  ; set return true
    
    cmp byte[ecx], 'A'          ; compare with 'A'
    jge _chkZ                   ; char >= 'A'
    
    mov eax, 0                  ; set return false
    ret
    
  _chkZ:
    cmp byte[ecx], 'Z'          ; compare with 'Z'
    jle _rtnupr                 ; <= is uppercase
    
    mov eax, 0                  ; set return false
    
  _rtnupr:
    ret

There are many ways to write the varying pieces and this is intended to fall more on the easier to follow side than the most efficient way it can be written side.

Example Use/Output

After you compile and link the code, e.g.

nasm -f elf -o ./obj/char_dist_32.o char_dist_32.asm
ld -m elf_i386 -o ./bin/char_dist_32 ./obj/char_dist_32.o

You can test with the inputs given in your question and others, e.g.

$ ./bin/char_dist_32
first letter : a
second letter: e
char distance: 4

$ ./bin/char_dist_32
first letter : d
second letter: b
char distance: 2

$ ./bin/char_dist_32
first letter : D
second letter: B
char distance: 2

$ ./bin/char_dist_32
first letter : a
second letter: Z
error: not alpha or same case

Look things over and let me know if you have further questions.

Style: `bufa resb 8` should use the constant you defined so you're not hard-coding the same number in 2 places. `bufa resb bufasz`. (Or use the same constant size for both buffers so it can have a simpler name.) Also, one clean way to organize names for length is to do `.len: equ $ - foo`, which defines `foo.len` as the name, using NASM's local symbol mechanism that concatenates onto the last non-dot label. That makes the pattern of `symbol` and `symbol.len` clear and easy to remember. You do still have to type the symbol name twice, once in the `$-symbol` equ, but not 3 times. — Peter Cordes, Jan 08 '21 at 04:03
I had not used the concatenation of dot with non-dot names yet... candidly because, I don't even recall it from the last time I went though the nasm docs. It's like the use of multiple parameters with macros in nasm -- unless you use it often enough, it never seems to stay short term recall. Thank you again @PeterCordes — David C. Rankin, Jan 08 '21 at 04:14
Our chat in comments under the question piqued my interest in doing this efficiently so I whipped up a fun version and posted an answer :P — Peter Cordes, Jan 08 '21 at 04:30
Now I can learn from a master. I was toying will cleanups you suggested, but unless I move .bss under .data, then nasm complains (warns) about forward declarations having unpredictable results when using `bufa resb babsz`. Switching here wouldn't matter, but in larger projects I thought you always wanted the uninitialized segment defined first since allocation will begin at the end of the uninitialized segment? — David C. Rankin, Jan 08 '21 at 04:33
Oh, my mistake, I thought NASM would just allow forward reference to an EQU the same way it does to labels defined later. Re: section layout in the linked binary: no, BSS-last would be needed for a flat binary where NASM lays out the sections in source order, but you're going to be linking this into a ELF executable. `ld`'s default linker script arranges the sections. (And yes, `.bss` normally goes in the same segment as `.data`, as space past the end of the file-backed part of that ELF segment.) — Peter Cordes, Jan 08 '21 at 04:38
In any case, you only have to move the `equ` declarations! They're global and aren't associated with any section. — Peter Cordes, Jan 08 '21 at 04:39
Actually, NASM 2.15.05 does *not* warn about `buf resb bufsz` coming before `bufsz: equ 32`. Neither does NASM 2.11.05 from 2014 on an old machine. Maybe you're using some ancient NASM from before they decided that guaranteeing this was a good idea? (BTW, good NASM style would be to always put a `:` after label, including in the BSS.) — Peter Cordes, Jan 08 '21 at 04:44
Also: constant names: `write` is a symbol name in libc for the wrapper function, so a better pattern to follow might be `sys/syscall.h` which defines `SYS_write`. (There's also asm/unistd.h which defines __NR_write.) For real use you might consider `gcc -dM` and some text processing to turn the `#define` lines into NASM `equ` lines on either of those C headers. — Peter Cordes, Jan 08 '21 at 04:49
Old openSUSE box, `NASM version 2.13.02 compiled on Jan 24 2018` so that where the warnings came in. With this old version the actual warning is `warning: forward reference in RESx can have unpredictable results` Glad the newer versions allow it. Yes, there are name collisions with `write`, `read`, `stdin` and `stdout` if the code above was ever linked along with C. That did cross my mind, but for this simple stand-alone code, I didn't get creative enough to dream up new names. — David C. Rankin, Jan 08 '21 at 05:06
Interesting, yeah it's reproducible on Godbolt with NASM 2.13.02 through 2.14.02. https://godbolt.org/z/WKaeP4. But not with earlier NASM, not present as late as 2.12. So I guess someone thought it might be problematic (or it actually was in some cases?) and added a warning. But then NASM 2.15 removed it again? Actually no, https://fossies.org/linux/nasm/asm/assemble.c shows there's still a `case` for it in NASM 2.15.05's source code, but in this case it doesn't count as "forward"? I'm pretty sure `WARN_OTHER` is enabled, and `-w+other` or `-w+all` don't make NASM warn. — Peter Cordes, Jan 08 '21 at 06:14
So I think NASM 2.15.05 actually broke this error message by accident; I build NASM with debug symbols so I could single-step through it, and it's formatting the error and deferring it. But it never gets around to actually printing it when the code comments say it should. So that's a NASM bug. — Peter Cordes, Jan 08 '21 at 06:57
Yup, NASM bug. Reported as https://bugzilla.nasm.us/show_bug.cgi?id=3392734. Thanks for your help in testing old NASM versions to detect it. — Peter Cordes, Jan 11 '21 at 07:50
Glad to help -- and glad to be back! Snow storm in Texas took out power 7:30 Sunday, and just came back this Tuesday Morning. UPS batteries long exhausted `:)` — David C. Rankin, Jan 12 '21 at 14:45

Peter Cordes · Answer 2 · 2021-01-08T04:51:19.010

It's as simple as abs(c1 - c2). You can subtract the ASCII codes; the +'a' part will cancel out.

Or if you want case-insensitive, force the chars to lower-case first with c1|=0x20. See What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa? (which also provides an efficient way to check if the inputs are alphabetic along with converting to lower case, resulting in a 0..25 position in the alphabet).

Printing out a number as a decimal ASCII string of digits is covered in How do I print an integer in Assembly Level Programming without printf from the c library? for the general case. We can simplify if we can assume a 2-digit number, even moreso if we don't mind printing a leading zero.

Just for fun, let's see how simple we can make this. Well maybe not always "simple", but compact and minimalist. There's less code to understand, but some of the steps aren't the way I'd expect a beginner to do it. (e.g. 2 chars in different positions of the same register, and using the AH:AL result of 8-bit DIV as a string.)

For starters we'll take both characters on the same line as part of the same read system call. (Command line args are also simple, but more simple if you don't check argc and just crash if they aren't supplied.)

I left out printing prompts because that's boring and can be done if you want.

SYS__exit   equ 1
SYS_read   equ 3
SYS_write  equ 4

section .text          ; optional, already the default section
global _start 
_start:
   sub  esp, 32        ; reserve some stack space.  I only ended up using 4 bytes

   mov  eax, SYS_read
   xor  ebx, ebx       ; ebx=0 = stdin
   mov  ecx, esp
   mov  edx, 4         ; long enough to get 2 chars and a newline with room to spare
   int  0x80           ; read(STDIN_FILENO, stackbuf, 4)

   movzx eax, word [ecx]  ; load both chars into AL and AH
   or    eax, 0x2020      ; force them both to lower case, operating on AL and AH at once
    ;;; if you want to check for non-alphabetic chars, add code here
   sub   al, ah           ; subtract ASCII codes
   jns  .done_abs         ; jump if result wasn't negative
    neg   al               ; AL = abx(al)
   .done_abs:

   movzx eax, al          ; clear AH
   mov   dl, 10
   div   dl               ; AL = diff/10 (leading digit)  AH = diff%10 (least significant digit)
   add   eax, `00\n`      ; 0..9 integer digits to ASCII codes and append a newline
   mov   [ecx], eax       ; store ASCII chars back into the same stack buffer
   
   mov   eax, SYS_write
   mov   ebx, 1           ; stdout
   ;mov  ecx, esp         ; still set
   mov   edx, 3           ; 2-digit number + newline, maybe including leading zero
   int   0x80             ; write(1, stackbuf, 3)

   mov   eax, SYS__exit
   xor   ebx, ebx
   int   0x80            ; _exit(0)

Fun fact: despite using partial registers, this mostly avoids partial-register stalls. (Except for adding to EAX after div dl writes AX.) If you were actually optimizing for performance, you'd use a multiplicative inverse. (https://codereview.stackexchange.com/questions/142842/integer-to-ascii-algorithm-x86-assembly)

Test cases:

$ nasm -felf32 -Worphan-labels char-diff.asm &&
 ld -melf_i386 -o char-diff char-diff.o

$ ./char-diff 
ab
01
$ ./char-diff 
Ax
23
$ ./char-diff 
Aa
00
$ ./char-diff 
Xa
23
$ ./char-diff 
xA
23

Yes, that is about a tight as you can get it. I had to pull gdb out just to convince myself that `neg al` would provide twos-compliment of the signed result, and of course it did. — David C. Rankin, Jan 08 '21 at 04:59

will work out how far apart two letters are in the alphabet

2 Answers2