0

I am learning assembly. I am working on a simple program which should do the following:

  • read a 2 digits number from STDIN
  • decrement the read number
  • send the result of the decrement to STDOUT

Here the program I have so far:

section .data
    max_number_size: equ 2 ; max length of read input

section .bss
    input_buffer: resb max_number_size ; used to store input number

section .text
    global _start
    _start:
        mov eax, 3                  ; sys call read
        mov ebx, 0                  ; from FD 0
        mov ecx, input_buffer       ; indicate the adress of the memory buffer where the bytes will be stored
        mov edx, max_number_size    ; read this quantity of character
        int 80H                     ; store max_number_size to input_buffer from STDIN

    atoi:
        mov eax, 0                  ; Set initial total to 0
        mov ebx, 0                  ; keep track of nbr of char processed
        
        atoi_convert:
            mov esi, [ecx]              ; Get the current character
            test ebx, max_number_size   ; break the loop
            je _stdout


            cmp esi, 48                 ; Anything less than char 0 is invalid (check ASCII table)
            jl _exit_1

            cmp esi, 57                 ; Anything greater than char 9 is invalid (check ASCII table)
            jg _exit_1

            sub esi, 48                 ; Convert from ASCII to decimal (0 starts at 48d in ASCII)
            imul eax, 10                ; Multiply total by 10d
            add eax, esi                ; Add current digit to total

            inc ecx                     ; Get the address of the next character
            inc ebx                     ; keep track of nbr of char processed
            jmp atoi_convert

    _stdout:
        mov ecx, eax
        mov eax, 4
        mov ebx, 1
        mov edx, 32
        int 80h

    _exit_0:
        mov eax, 1
        mov ebx, 0
        int 80H

    _exit_1:
        mov eax, 1
        mov ebx, 1
        int 80H

Notice that, in the code above, there is no decrement yet.

What I am habing trouble to understand is what is actually sent to STDOUT in the _stdout label here.

What I understand is that, using the atoi, the ASCII chars read from stdin are transformed into an actual decimal value (if input is 12, then the value in eax after atoi would be 0000 1100 (binary)).

So, when reaching _stdout, we have 12d (0000 1100b) in eax. In order to send it to STDOUT, I move the value into ecx, configure eax, ebx, edx for the syscall and then, boom, syscall.

However, there is no output at all.

See, for example:

/app # make
nasm -f elf -g -F stabs main.asm
ld -o main.exe main.o -m elf_i386
/app # ./main.exe > stdout
49
/app #
/app # ls -l | grep stdout
-rwxr-xr-x    1 root     root             0 Feb  7 12:05 stdout

The stdout file is empty. Not even a single byte.

What I am doing wrong in this program and what, consequently, am I not understanding correctly ?

Itération 122442
  • 2,644
  • 2
  • 27
  • 73
  • I'm not familiar with Linux syscalls, but the output function is probably expecting a string, so you'll have to do the itoa bit also. – 500 - Internal Server Error Feb 07 '23 at 12:20
  • 2
    Yes, the write syscall expects a pointer. It's not only that you have not converted the value to text, but you even try to pass it as a pointer which will fault. You can do `push eax; mov ecx, esp; mov edx, 1` to at least output the binary result (should give 1 byte with the proper ascii code). Also note that `mov esi, [ecx]` is a 4 byte load. You want `movzx esi, byte [ecx]`. The `test ebx, max_number_size` is wrong too, you want `cmp` not `test` and that should of course be before you even try to load the next letter to avoid bad access. – Jester Feb 07 '23 at 12:29
  • 1
    Furthermore you should use the actual return value from `read` because the user might not have entered the `max_number_size` digits (ie. could be a single digit number). – Jester Feb 07 '23 at 12:31
  • [Add 2 numbers and print the result using Assembly x86](https://stackoverflow.com/a/28524951) / [How do I print an integer in Assembly Level Programming without printf from the c library? (itoa, integer to decimal ASCII string)](https://stackoverflow.com/a/46301894) for correct printing, the reverse of [NASM Assembly convert input to integer?](https://stackoverflow.com/a/49548057) (like your atoi loop, but a little bit more efficient.) – Peter Cordes Feb 07 '23 at 20:24

1 Answers1

2

UPDATE: As @Peter Cordes said, the first code lacks the function of outputting numbers in ASCII. The current code added an ASCII number printing function based on reference link given by @Peter Cordes. Thanks!


How about using debugger like GDB to step execution and see how the registers change?

If you breakpoint before the _stdout's int 0x80 using GDB, you would notice that the ecx is NULL.

(gdb) catch syscall 4
Catchpoint 1 (syscall 'write' [4])
(gdb) r
49 <- user's input

Catchpoint 1 (call to syscall write), 0x08049053 in _exit_0 ()
(gdb) p/x $ecx
$1 = 0x0

Let's find out why. The caller of _stdout is atoi_convert. Let's breakpoint atoi_convert and run step by step (using si command). Don't you notice that it always jump _stdout in je _stdout?
test ebx, max_number_size is actually just doing an and instruction. It's usually for detecting zero. In this case, cmp ebx, max_number_size is fine.

In atoi_convert, mov esi, [ecx] copies a 4-bytes contents of ecx to esi.

(gdb) b atoi_convert
Breakpoint 1 at 0x8049020
(gdb) r
49 <- user's input

Breakpoint 1, 0x08049020 in atoi_convert ()
(gdb) ni <- to run mov esi, [ecx]
0x08049022 in atoi_convert ()
(gdb) p/x $esi
$1 = 0x3934
(gdb) x/wx $ecx
0x804a000 <input_buffer>:       0x00003934

As you see, $esi is 0x3934 when I input 49. But you assume esi is 1-byte (from cmp esi, 48 etc). As @Jester said, you should use movzx esi, byte [ecx] to ensure the content is a 1-byte and you should use read's return value instead of max_number_size.

Moreover, if the user enters only one character on the terminal, a newline character will be also added. I think it would be better to call _stdout instead of _exit_1 if there are unrecognizable characters.

Let's breakpoint before the int 0x80 again.

(gdb) catch syscall write
Catchpoint 1 (syscall 'write' [4])
(gdb) r
49 <- user's input

Catchpoint 1 (call to syscall write), 0x08049053 in _exit_0 ()
(gdb) p $ecx
$1 = 49

The second argument should be a pointer, but it's a value. I need to convert the value to a pointer to an ASCII character. Have a look at this: How do I print an integer in Assembly Level Programming without printf from the c library?

Anyway, the following code will work. I've also added fixes I didn't mention, such as error checking for read. Also, I added an ASCII number printing function based on reference link.

section .data
    max_number_size: equ 2

section .bss
    input_buffer: resb max_number_size

section .text
    global _start
    _start:
        mov eax, 3
        mov ebx, 0
        mov ecx, input_buffer
        mov edx, max_number_size
        int 80H
        cmp eax, -1 ; error check
        jz _exit_1
        mov edi, eax

    atoi:
        mov eax, 0
        mov ebx, 0

        ; when input is completely wrong (like `AA`), don't show output.
        movzx esi, byte [ecx]
        cmp ebx, edi
        je _exit_1
        cmp esi, 48
        jl _exit_1
        cmp esi, 57
        jg _exit_1

        atoi_convert:
            movzx esi, byte [ecx]
            cmp ebx, edi
            je print_uint32

            cmp esi, 48
            jl print_uint32

            cmp esi, 57
            jg print_uint32

            sub esi, 48
            imul eax, 10
            add eax, esi

            inc ecx
            inc ebx
            jmp atoi_convert

    ; This code is referenced from <https://stackoverflow.com/a/46301894>.
    print_uint32:
        xor ebx, ebx
        mov ecx, 0xa
        push ecx
        mov esi, esp
        add esp, 4
    .toascii_digit:
        inc ebx ; Number of digits to output
        xor edx, edx
        div ecx
        add edx, '0'
        dec esi
        mov [esi], dl

        test eax, eax
        jnz .toascii_digit

        mov edx, ebx ; ebx will be overwritten. Move it before that.
        mov eax, 4
        mov ebx, 1
        mov ecx, esi
        int 80h

    _exit_0:
        mov eax, 1
        mov ebx, 0
        int 80H

    _exit_1:
        mov eax, 1
        mov ebx, 1
        int 80H
couyoh
  • 313
  • 3
  • 9
  • 1
    `strace ./a.out` is also very useful as a first step to trace system calls and decode their args. Then you know what to look for in GDB, instead of having to carefully check each register for each `int 80h`, where confirmation bias can be a problem e.g. for beginners mixed up about `write` taking a pointer, not a value. BTW, your version calls `write` to output one byte as a binary integer, not a text string of digits. So that's odd, normally you'd use [How do I print an integer in Assembly Level Programming without printf from the c library?](https://stackoverflow.com/a/46301894) – Peter Cordes Feb 07 '23 at 20:19
  • Thank you for this complete answer (as well as @Jester in the comments). I'm checking it carefully and I have a question about the return of read. I have a breakpoint at int 80h, and one at the next instruction. The first gives eax = 3 (expected). I input one character ("4") and at the next breakpoint, I have eax = 2. But from what you said and your code, it looks like it should be -1 (I input only 1 of the 2 chars expected). Did I miss something ? (EDIT: I just realized that should be itself a new question...) – Itération 122442 Feb 08 '23 at 07:52
  • 1
    @Itération122442 I assume you breakpoint in `read`. If you input something on the terminal, it adds a newline character. When you redirects stdin from 1-byte file, return value will be one. It can be less than the third argument. See return value in `read.2` by `man` command. – couyoh Feb 08 '23 at 10:54