1

Here is a program that takes decimal ASCII numbers one by one and converts them into an integer. The result is stored in the EDI register:

global _start
%macro kernel 4
    mov eax, %1
    mov ebx, %2
    mov ecx, %3
    mov edx, %4
    int 80h
%endmacro
section .bss
    symbol resb 1
section .text
_start:
    mov esi, 10
    xor edi, edi
.loop:
    kernel 3, 0, symbol, 1 ; load 1 char from STDIN into symbol
    test eax, eax          ; nothing loaded - EOF
    jz .quit
    xor ebx, ebx
    mov bl, [symbol]
    sub bl, '0'
    cmp bl, 9
    jg .quit               ; not a number
    mov eax, edi           ; previously accumulated number
    mul esi                ; eax *= 10
    lea edi, [eax + ebx]
    jmp .loop

.quit:
    mov eax, 1
    mov ebx, edi
    int 80h

I compile it:

$ nasm -g -f elf32 st3-18a.asm
$ ld -g -m elf_i386 st3-18a.o -o st3-18a
$ ./st3-18a
2[Enter]
Ctrl-d

When I run this code in gdb step by step everything is correct, and the result stored in EDI at the end is 2. But when I run without a debugger, and echo the program return value:

$ ./st3-18a
2[Enter]
Ctrl-d
$ echo $?
238

Why does it output 0xEE? What is wrong?

Sep Roland
  • 33,889
  • 7
  • 43
  • 76
user4035
  • 22,508
  • 11
  • 59
  • 94
  • 5
    You are processing the enter (line feed) as well. That is ascii code 10. f you subtract `'0'` you get `-38` aka. `218`. This gets added to the previous digit which is valued as `20` giving `238` which is `0xEE`. Fix: you want an unsigned `ja quit` instead of signed `jg .quit`. – Jester Nov 17 '22 at 20:29
  • The sub/cmp range-check trick needs an **unsigned** compare. `ja .quit`. You're using `jg`, so ASCII codes less than `'0'` are accepted. – Peter Cordes Nov 18 '22 at 05:00
  • 1
    Strange that this worked in a debugger, like maybe you were able to submit terminal input without a trailing newline? What debugger? – Peter Cordes Nov 18 '22 at 05:07
  • @PeterCordes gdb. breakpoint at `_start` and stepped through with `nexti`. – user4035 Nov 18 '22 at 07:02
  • 1
    Oh, your program only uses `read` with a size of `1`, not a buffer it loops over. And it's on the same terminal GDB is using, so GDB gets the leftover newline as terminal input when it reads from the terminal after suspending the program, consuming it so it's not there for the program. I'll see if there's a duplicate about that. – Peter Cordes Nov 18 '22 at 08:08
  • 1
    BTW, you can use `starti` to stop before the first user-space instruction (which will be `_start` if its statically linked), and normally you'd use `stepi` (shortcut `si`) unless you actually want to step over anything. `ni` uses a different mechanism that isn't always 100% guaranteed to stop correctly, but on x86 `si` can use hardware support for single-stepping. – Peter Cordes Nov 18 '22 at 08:08
  • 1
    [How to redirect std::cin to a Linux terminal when debugging with GDB?](https://stackoverflow.com/q/24317923) shows how to attach GDB to a running process, so you could debug what's actually happening, with your process not having to share a terminal with GDB. I should maybe just reopen this and post an answer, since I haven't found an exact duplicate for that part. – Peter Cordes Nov 18 '22 at 11:57
  • @PeterCordes Will try it in the future, thanks. Yes, answer please. – user4035 Nov 18 '22 at 14:29

1 Answers1

2

Your range-check is buggy, using a signed compare (jg) instead of unsigned (ja), so you only detect non-digit characters when c - '0' is from 10..127, not when it wraps around (i.e. becomes signed negative), missing almost half of the byte values you should be excluding. Including control codes like newline at the low end of the ASCII range.


So why does GDB make it work?

Your program only uses read with a size of 1, leaving a newline unread after you press return. That's ASCII 0xa = '\n', so your next read(1, buf, 1) gets it.

Unless GDB gets it first: GDB takes over the terminal to read more commands after your read(1,buf,1), so GDB gets the leftover terminal input and discards the newline before single-stepping to the next read system call. Or it might just be getting discarded when GDB switches the terminal from cooked to raw so it can read single keystrokes without waiting for it to be "submitted" from the kernel's canonical-mode line-editing with the EOL (newline) or EOF (ctrl-d) control characters.

That's because your program is sharing a terminal with GDB, rather than attaching GDB to your program already running in another terminal tab / window. i.e. on a different Unix TTY. e.g. with gdb -p $(pidof st3-18a).

You can also do that with strace, or just strace ./st3-18a since strace doesn't have interactive input.


It's common to read into a decent sized buffer and ignore later characters in toy programs that use "cooked" TTY input. That will break if you redirect input from a file so multiple lines are ready at once, so if you want something robust you can use fgets from libc.

As long as you realize that the I/O is simplistic and not robust, though, do whatever floats your boat when playing around with asm, even if that means making assumptions about lines and tty handling and that the user pressed enter instead of control-d.

Play around with cat running in a terminal, typing a partial line and hitting control-D. You can strace -p $(pidof cat) from another terminal to see its system calls.

See also How do I ignore line breaks in input using NASM Assembly?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847