Comparing numbers as strings of ASCII chars not working correctly in NASM

Question

I am new to assembly language. I am using nasm under Ubuntu-Linux.
I found following example in a book. However, it is not working correctly.

section .text
    global main
main:
    mov ecx, [num1]
    cmp ecx, [num2]
    jg check_third_num
    mov ecx, [num2]
check_third_num:
    cmp ecx, [num3]
    jg _exit
    mov ecx, [num3]
_exit:
    mov [largest], ecx
    mov ecx, msg
    mov edx, len
    mov ebx, 1
    mov eax, 4
    int 0x80
    mov ecx, largest
    mov edx, 2
    mov ebx, 1
    mov eax, 4
    int 0x80
    mov eax, 1
    int 80h
 section .data
    msg db "The largest digit is:", 0xA, 0xD
    len equ $- msg
    num1 dd '17'
    num2 dd '52'
    num3 dd '31'
 segment .bss
    largest rest 2

It should find the largest digit. But the result is

The largest digit is:
17

Jester · Accepted Answer · 2014-07-23T00:05:35.107

The code only works for single digit numbers. I assume the original code in the book used such numbers, but you have changed it.

The issue is that the numbers are stored as strings in this example, to avoid the need for a binary-to-text conversion. It works for single digits because the ascii codes for the digits are consecutive and have the same ordering as the numeric values. Thus if you write for example dd '2' that will allocate 4 bytes of storage, with the 4 bytes being: 0x32 0x00 0x00 0x00 (0x32 is the ascii code for 2). The code then uses these bytes as a 32 bit binary number, so the processor will interpret it to be 0x00000032 because x86 is a little endian architecture. I hope you see how this will work for all single digits.

For multiple digits (up to 4) the endianness will make the cpu consider the digits from right to left, that is the numbers in your example will be interpreted as 71, 25 and 13, respectively(*). Since 71 is the largest of those the program will print the entry for that number, which is the string 17.

*Actually the numbers will be 0x00003731, 0x00003235 and 0x00003133.

Thank you very much. This was very helpful. I did not change the digit sizes of the numbers. The original numbers were '47', '22' and '31'. Thus, the code found the correct result by chance. The book does not mention anything about that. Maybe you should write such a book :) One more thing: is it guaranteed that always 4 bytes of data is moved to a 32 bit register from a memory address? Thank you. — user2972185, Jul 23 '14 at 02:40
Each byte is 8 bit, so yes, 32 bits are 4 bytes. You can of course load smaller data into smaller registers or use sign/zero extension (`MOVSX`, `MOVZX`). — Jester, Jul 23 '14 at 08:51

score 0 · Answer 2 · answered Jul 27 '22 at 22:27

Typically for multi-digit numbers, you would convert the input string to integer and then compare the int values in 32-bit binary registers. (You can keep the original digit-strings around so you can print those instead of having to convert your number back to a string of base-10 digits.)

In the special case where your digit-strings are all the same length (including leading '0's if any), you can treat the whole sequence of ASCII codes as a big-endian number.

Strings are stored in printing order, most-significant digit first (at the lowest address in memory, earlier in the string). But x86 is little-endian, so the lowest-address byte gets treated as the least significant if we just loaded and compared, like you're doing.

dd '17' is the same as db '1', '7', 0, 0, which is also the same as db 0x31, 0x37, 0, 0 (check an ASCII table), which is the same as 0x00003731 (on x86 which is little-endian).
dd '52' is dd 0x00003235, which as you can see is smaller than 0x00003731.

But if we reversed the bytes of each digit-string, integer compare on the resulting value would compare the strings in lexical order. (This trick is useful in general for memcmp with small fixed sizes, BTW.) So we essentially want to treat 4-byte digit-strings as big-endian integers.

x86 has an instruction for that, bswap eax. Or to just swap bytes of a 16-bit register, rol ax, 8.

After a bswap, '17' becomes 0x31370000. '52' becomes 0x35320000

    mov  eax, [num1]
    mov  ecx, [num2]
    mov  edx, [num3]    ; load ASCII strings (padded with 0 bytes to dword)
    bswap eax
    bswap ecx
    bswap edx           ; byte reverse them to integers that compare in the right order
    cmp  eax, ecx       ; then compare registers instead of memory
    jg   check_third_num

   ...                  ; end up with the largest in ECX

    bswap  ecx          ; put it back into printing order
    mov [largest], ecx  ; and store it somewhere.
   ...                  ; and make a write() system call

Instead of branching, we could use cmp eax,ecx / cmovg ecx, eax to do ECX=EAX if EAX>ECX (signed). Then one more cmp/cmov would take the max of this and the final number.

We could have used movzx eax, word [num1] to load just 2 digits, in case the strings weren't padded to dword length with 00 bytes, e.g. if they were in dw '17' 2-byte words.

Although it wouldn't actually be a problem to have garbage in the high 2 bytes of each register, which become the low 2 after bswap. If the 2 digits we care about are different, those will make the integer values compare in the right relative order. And if they differ only in that trailing garbage, they might compare greater or less, but it doesn't matter which one you pick as long as you're not going to print the garbage. Unless these are just keys for sorting something else. You could just rol ax,8 to only endian-swap the low word of EAX, and use 16-bit cmp ax,cx to ignore the high 2 bytes of the full registers.

What would be a problem is digit-strings of different lengths. Then the place-values wouldn't line up after byte swapping, if you load from the start of the string.

dd  '123'     ; 0x00333231.  After byte swap: 0x31323300
dd  '99'      ; 0x00003939.  After byte swap: 0x39390000   !problem
dd  '099'     ; 0x00393930.  After byte swap: 0x30393900   works with '123'

You need the least-significant digit of the digit-string to load into the same place in the register for each input. So after byte-swapping, that digit and all higher digits line up, with binary place-values that match their place-value in the decimal number.

If you had a digit-string without leading zeros like '99' that you wanted to use with 3-digit numbers, you could potentially load and left-shift before bswap (or right-shift after), shifting by 8*length_difference bits. i.e. byte-shifting 0x00003939 to 0x00393900.

But then you need to know the length of each digit-string, or do a load that ends at the end of it. (leading garbage is a problem, though, unlike trailing garbage.)

Often easier to just convert strings to integers, unless they're too big to fit in a 32 or 64-bit integer. Then you might compare lengths (not counting leading zeros); the longer number is larger. If lengths are equal, then you're ready to use this digit-string trick which is basically strcmp or memcmp.

Potentially with an SSE2 pcmpeqb / pmovmskb, and bsf that bitmap to search for the first non-equal byte, starting from the lowest (which came from the lowest-address input byte, i.e. most significant digit). bsf is bit-scan-forward, like tzcnt. To find which one is greater if they're not equal, perhaps pcmpgtb and check that mask bit, or just index the memory and subtract the ASCII codes at the position you know differs. (bsf ecx,ecx / movzx eax, byte [num1 + ecx] / cmp al, [num2 + ecx]). Or sub, and you don't even need to zero-extend the two bytes before subtracting to avoid overflow like memcmp does, because you know they're ASCII codes for decimal digits, 0x30 .. 0x39. Would also work for hex digits if they all use the same case (upper or lower), since 'A'..'F' have higher ASCII codes than '0'..'9', so lexical string compare orders them correctly when they're the same length.

score -1 · Answer 3 · answered Jul 23 '14 at 00:08

-1

You should write num1 dd 17 without the quotes. Using quotes will give very odd results (like Jester mentions).

It's a pretty bad code sample your book has.

By the way, experiment with running your code in a debugger.

answered Jul 23 '14 at 00:08

Lasse Reinhold

168
6

Thank you for your help. I wrote the code using nano and compiled it with nasm and gcc. Is there a tool for debugging assembly code under ubuntu-linux? – user2972185 Jul 23 '14 at 02:45

Comparing numbers as strings of ASCII chars not working correctly in NASM

3 Answers3

Related