0

I'd expect the program below to read some characters (up to 9) from stdin, and place them at a specified location in memory.

What actually happens: when I press Enter, if I have less than 9 characters, it simply goes to the next line; this will keep happening until I entered 9 characters. If I enter more than 9, the extra characters will be interpreted as a shell command. Why doesn't it terminate when I press Enter ?

Using nasm 2.14.02 on Ubuntu.

  global _start
  section .bss
    buf resb 10
  section .text
    ; Read a word from stdin, terminate it with a 0 and place it at the given address.
    ; - $1, rdi: *buf - where to place read bytes
    ; - $2, rsi: max_count, including the NULL terminator
    ; Returns in rax:
    ; - *buf - address of the first byte where the NULL-terminated string was placed
    ; - 0, if input too big
    read_word: ; (rdi: *buf, rsi: max_count) -> *buf, or 0 if input too big
      mov r8, 0      ; current count
      mov r9, rsi    ; max count
      dec r9         ; one char will be occupied by the terminating 0

      ; read a char into the top of the stack, then pop it into rax
      .read_char:
        push rdi       ; save; will be clobbered by syscall
        mov rax, 0     ; syscall id = 0 (read)
        mov rdi, 0     ; syscall $1, fd = 0 (stdin)
        push 0         ; top of the stack will be used to place read byte
        mov rsi, rsp   ; syscall $2, *buf = rsp (addr where to put read byte)
        mov rdx, 1     ; syscall $3, count (how many bytes to read)
        syscall
        pop rax
        pop rdi

      ; if read character is Enter (aka carriage-return, CR) - null-terminate the string and exit
      cmp rax, 0x0d ; Enter
      je .exit_ok

      ; not enter ⇒ place it in the buffer, and read another one
      mov byte [rdi+r8], al ; copy character into output buffer
      inc r8                ; inc number of collected characters
      cmp r8, r9            ; make sure number doesn't exceed maximum
      je .exit_ok           ; if we have the required number of chars, exit
      jb .read_char         ; if it's not greater, read another char

      .exit_ok: ; add a null to the end of the string and return address of buffer (same as input)
        add r8, 1
        mov byte [rdi+r8], 0
        mov rax, rdi
        ret

      .exit_err: ; return 0 (error)
        mov rax, 0
        ret

  _start:
    mov rdi, buf     ; $1 - *buf
    mov rsi, 10      ; $2 - uint count
    call read_word

    mov rax, 60  ; exit syscall
    mov rdi, 0   ; exit code
    syscall

Mihai Rotaru
  • 1,953
  • 3
  • 26
  • 28
  • 1
    The `syscall` may clobber any and all of rax, rcx, rdx, rsi, rdi, r8, r9, r10, and r11. You are using some of those to hold data, so that data may be lost. – Chris Dodd Mar 21 '21 at 22:26
  • @ChrisDodd cheers; in this particular program i think only `r8` and `r9` are problematic – Mihai Rotaru Mar 21 '21 at 22:46
  • 2
    @ChrisDodd: https://stackoverflow.com/a/2538212/634919 says only `rax, rcx, r11` are clobbered. You may be thinking of the function calling conventions? – Nate Eldredge Mar 21 '21 at 23:38
  • @ChrisDodd: The system-call calling convention isn't the same as the function-calling convention. Across ISAs, it's fine to assume that the arg-passing regs are unmodified by a Linux system call, other than the return value. (And for x86-64, RCX and R11 are also clobbered by the `syscall` instruction itself.) – Peter Cordes Mar 22 '21 at 00:36

1 Answers1

2

First, when the user hits Enter, you will see LF (\n, 0xa), not CR (\r, 0xd). This may explain why your program doesn't exit when you think it should.

As far as why extra characters go to the shell, this is about how the OS does terminal input. It accumulates keystrokes from the terminal into a kernel buffer until Enter is pressed, then makes the whole buffer available to be read by read(). This allows things like backspace to work transparently without requiring the application to explicitly code it, but it does mean that you can't literally read one keystroke at a time, as you're noticing.

If your program exits while the buffer still contains characters, then those characters will be read by the next program which attempts to read from the device, which in your case will be the shell. Most programs that read stdin avoid this by continuing to read and process data until end-of-file is seen (read() returning 0) which happens for a terminal when the user presses Ctrl-D.

If you really need to process input character-by-character, you need to set the terminal to non-canonical mode, but many things will be different in this case.

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82