Basic input with x64 assembly code

Question

I am writing a tutorial on basic input and output in assembly. I am using a Linux distribution (Ubuntu) that is 64 bit. For the first part of my tutorial I spoke about basic output and created a simple program like this:

global      _start
section     .text
_start:
    mov         rax,1
    mov         rdi,1
    mov         rsi,message
    mov         rdx,13
    syscall
    mov         rax,60
    xor         rdi,rdi
    syscall

section     .data
    message:    db          "Hello, World", 10

That works great. The system prints the string and exits cleanly. For the next part of my tutorial, I simply want to read one character in from the keyboard. From my understanding of this web site we change the rdi register to be 0 for a sys_read call.

I first subtract 8 from the current rsp and then load that address into the rsi register. (That is where I want to store the char). When I compile and run my program it appears to work... but the terminal seems to mimick the input I type in again.

Here is the program:

global      _start            
section     .text
_start:
    sub         rsp,8           ; allocate space on the stack to read
    mov         rdi,0           ; set rdi to 0 to indicate a system read
    mov         rsi,[rsp-8]
    mov         rdx,1
    syscall

    mov         rax,1
    mov         rdi,1
    mov         rsi,message
    mov         rdx,13
    syscall
    mov         rax,60
    xor         rdi,rdi
    syscall

section     .data
    message:    db          "Hello, World", 10

and this is what happens in my terminal...

matthew@matthew-Precision-WorkStation-690:~/Documents/Programming/RockPaperScissors$ nasm -felf64 rps.asm && ld rps.o && ./a.out
5
Hello, World
matthew@matthew-Precision-WorkStation-690:~/Documents/Programming/RockPaperScissors$ 5
5: command not found
matthew@matthew-Precision-WorkStation-690:~/Documents/Programming/RockPaperScissors$

The input 5 is repeated back to the terminal after the program has exited. What is the proper way to read in a single char using NASM and Linux x64?

score 6 · Accepted Answer · answered Apr 19 '18 at 19:12

In your first code section you have to set the SYS_CALL to 0 for SYS_READ (as mentioned rudimentically in the other answer).

So check a Linux x64 SYS_CALL list for the appropriate parameters and try

_start:
  mov         rax, 0          ; set SYS_READ as SYS_CALL value
  sub         rsp, 8          ; allocate 8-byte space on the stack as read buffer
  mov         rdi, 0          ; set rdi to 0 to indicate a STDIN file descriptor
  lea         rsi, [rsp]      ; set const char *buf to the 8-byte space on stack
  mov         rdx, 1          ; set size_t count to 1 for one char
  syscall

Peter Cordes · Answer 2 · 2018-04-26T16:34:46.893

it appears to work... but the terminal seems to mimick the input I type in again.

No, the 5 + newline that bash reads is the one you typed. Your program waited for input but didn't actually read the input, leaving it in the kernel's terminal input buffer for bash to read after your program exited. (And bash does its own echoing of terminal input because it puts the terminal in no-echo mode before reading; the normal mechanism for characters to appear on the command line as you type is for bash to print what it reads.)

How did your program manage to wait for input without reading any? mov rsi, [rsp-8] loads 8 bytes from that address. You should have used lea to set rsi to point to that location instead of loading what was in that buffer. So read fails with -EFAULT instead of reading anything, but interestingly it doesn't check this until after waiting for there to be some terminal input.

I used strace ./foo to trace system calls made by your program:

execve("./foo", ["./foo"], 0x7ffe90b8e850 /* 51 vars */) = 0
read(0, 5
NULL, 1)                        = -1 EFAULT (Bad address)
write(1, "Hello, World\n", 13Hello, World
)          = 13
exit(0)                                 = ?
+++ exited with 0 +++

Normal terminal input/output is mixed with the strace output; I could have used -o foo.trace or whatever. The cleaned-up version of the read system call trace (without the 5\n mixed in) is:

read(0, NULL, 1)                        = -1 EFAULT (Bad address)

So (as expected for _start in a static executable under Linux), the memory below RSP was zeroed. But anything that isn't a pointer to writeable memory would have produced the same result.

zx485's answer is correct but inefficient (large code-size and an extra instruction). You don't need to worry about efficiency right away, but it's one of the main reasons for doing anything with asm and there's interesting stuff to say about this case.

You don't need to modify RSP; you can use the red-zone (memory below RSP) because you don't need to make any function calls. This is what you were trying to do with rsp-8, I think. (Or else you didn't realize that it was only safe because of special circumstances...)

The read system call's signature is

   ssize_t read(int fd, void *buf, size_t count);

so fd is an integer arg, so it's only looking at edi not rdi. You don't need to write the full rdi, just the regular 32-bit edi. (32-bit operand-size is usually the most efficient thing on x86-64).

But for zero or positive integers, just setting edi also sets rdi anyway. (Anything you write to edi is zero-extended into the full rdi) And of course zeroing a register is best done with xor same,same; this is probably the best-known x86 peephole optimization trick.

As the OP later commented, reading only 1 byte will leave the newline unread, when the input is 5\n, and that would make bash read it and print an extra prompt. We can bump up the size of the read and the space for the buffer to 2 bytes. (There'd be no downside to using lea rsi, [rsp-8] and leave a gap; I'm using lea rsi, [rsp-2] to pack the buffer right below argc on the stack, or below the return value if this was a function instead of a process entry point. Mostly to show exactly how much space is needed.)

 ; One read of up to 2 characters
 ; giving the user room to type a digit + newline
_start:
  ;mov      eax, 0          ; set SYS_READ as SYS_CALL value
  xor      eax, eax        ; rax = __NR_read = 0  from unistd_64.h
  lea      rsi, [rsp-2]    ; rsi = buf = rsp-2
  xor      edi, edi        ; edi = fd = 0 (stdin)
  mov      edx, 2          ; rdx = count = 2 char
  syscall                     ; sys_read(0, rsp-2, 2)
 ; total = 16 bytes

This assembles like so:

+ yasm -felf64 -Worphan-labels -gdwarf2 foo.asm
+ ld -o foo foo.o
ld: warning: cannot find entry symbol _start; defaulting to 0000000000400080

$ objdump -drwC -Mintel    
0000000000400080 <_start>:
  400080:       31 c0                   xor    eax,eax
  400082:       48 8d 74 24 ff          lea    rsi,[rsp-0x1]
  400087:       31 ff                   xor    edi,edi
  400089:       ba 01 00 00 00          mov    edx,0x1
  40008e:       0f 05                   syscall 
  ; next address = ...90

 ; I left out the rest of the program so you can't actually *run* foo
 ; but I used a script that assembles + links, and disassembles the result
 ; The linking step is irrelevant for just looking at the code here.

By comparison, zx485's answer assembles to 31 bytes. Code size is not the most important thing, but when all else is equal, smaller is better for L1i cache density, and sometimes decode efficiency. (And my version has fewer instructions, too.)

0000000000400080 <_start>:
  400080:       48 c7 c0 00 00 00 00    mov    rax,0x0
  400087:       48 83 ec 08             sub    rsp,0x8
  40008b:       48 c7 c7 00 00 00 00    mov    rdi,0x0
  400092:       48 8d 34 24             lea    rsi,[rsp]
  400096:       48 c7 c2 01 00 00 00    mov    rdx,0x1
  40009d:       0f 05                   syscall 
  ; total = 31 bytes

Note how those mov reg,constant instructions use the 7-byte mov r64, sign_extended_imm32 encoding. (NASM optimizes those to 5-byte mov r32, imm32 for a total of 25 bytes, but it can't optimize mov to xor because xor affects flags; you have to do that optimization yourself.)

Also, if you are going to modify RSP to reserve space, you only need mov rsi, rsp not lea. Only use lea reg1, [rsp] (with no displacement) if you're padding your code with longer instructions instead of using a NOP for alignment. For source registers other than rsp or rbp, lea won't be longer but it is still slower than mov. (But by all means use lea to copy-and-add. I'm just saying it's pointless when you can replace it with a mov.)

You could save even more space by using lea edx, [rax+1] instead of mov edx,1 at essentially no performance cost, but that's not something compilers normally do. (Although perhaps they should.)

@Matthew: I had been just going to answer with a more efficient version of ZX's code, but then I got curious how `read()` could block but still not `read` the byte. I didn't know that checking for `EFAULT` didn't happen until the data was available; learning something new while writing an answer is always cool. :) — Peter Cordes, Apr 25 '18 at 22:07
on a side note, I believe that `mov edx,1` needs to be `mov edx,2` otherwise the return gets sent to bash and new prompt line is shown. — Matthew, Apr 26 '18 at 15:23
@Matthew: Indeed, yes. And of course the buffer needs to be size 2 for that, or you overwrite the low byte of `argc` in `_start`, or your return address if this was a function like `main`. Updated (you might only get 1 byte, though). Note that it's possible for the user to type an extremely long line before pressing return (or pressing control-d to submit the line with no newline), so if you really want to remove all pending input before exiting, you should use a loop on `select()`, or somehow set fd 0 to non-blocking mode and keep reading until it returns an error like `-EWOULDBLOCK`. — Peter Cordes, Apr 26 '18 at 16:20
I just played around with it and attempted a buffer overflow attack, but it appears that because I passed a 2 into edx, it only read 2 bytes into the buffer. I did a traditional prologue `push rbp mov rbp,rsp sub rsp,1` and then attempted to write past the memory with multiple 0x41 'A' characters, but it only wrote 2 A's into my memory. (when I was using gdb to inspect it) Why is that? (PS. don't feel obligated to answer, you've already helped so much!) — Matthew, Apr 26 '18 at 18:54
FYI, if you are interested in following the development of the tutorial https://docs.google.com/document/d/1AY6ondu-4g6r94SS47OHG0UQH-AhJ381OnYFqO16YuM/edit?usp=sharing — Matthew, Apr 26 '18 at 19:12
@Matthew: [`read(fd,buf,len)`](http://man7.org/linux/man-pages/man2/read.2.html) will never store data outside `buf[0..len-1]`. Functions / system-calls with explicit-length buffers aren't vulnerable to buffer overflows unless you use them wrong! And BTW, if you're making a stack frame in `_start`, you should push a zero, rather than `rbp`, to NULL-terminate the linked list of stack frames, as recommended in the SysV ABI doc. (`rbp` might be zero on entry to `_start` already because that's what Linux does, but cleared to `push 0` in `_start`, because it's not a function.) — Peter Cordes, Apr 26 '18 at 19:37
@Matthew: Also, you generally don't want to misalign RSP. It won't actually matter if you don't have any signal handlers, but if you're going to `sub rsp, XYZ` / `mov rsi, rsp` instead of using the red-zone, you should make XYZ a multiple of 16. (`rsp` is 16-byte aligned on entry to `_start`, vs. `rsp-8` being 16-byte aligned on entry to a function.) — Peter Cordes, Apr 26 '18 at 19:41
Oh yeah, I read about having to keep RSP aligned. I don't fully understand what it means to be aligned yet... in a 64 bit machine the stack is 8 bytes wide right? So why does it have to be a multiple of 16 and not 8? — Matthew, Apr 27 '18 at 13:51
@Matthew: Aligned by 16 means the address is a multiple of 16, the low 4 bits are zero. So a 16-byte SSE2 load can't cross any larger boundary, like a cache line or page. [Why does System V / AMD64 ABI mandate a 16 byte stack alignment?](//stackoverflow.com/q/49391001) (the requirement / guarantee is only before a `call`, but you might as well do so at `_start` even if you're not planning to load/store from it. That's mainly a matter of style, and / or not teaching you something that might break later in another context. — Peter Cordes, Apr 27 '18 at 17:20

score 0 · Answer 3 · answered Apr 19 '18 at 16:27

0

You need to set eax to the system call number for read.

answered Apr 19 '18 at 16:27

prl

11,716
2
13
31

What value is that? The website I linked did not mention rax. (eax) – Matthew Apr 19 '18 at 16:34
I am sorry, it is the first column! (on the website I linked) – Matthew Apr 19 '18 at 16:34
@Matthew: Linux zeros registers before entry to user-space, so `_start` in a static executable under Linux will have `eax=0` to start with, even though the ABI doesn't guarantee it. (Dynamically linked executables have whatever garbage left by `ld.so`). It's only a good idea to depend on zeroed registers for code-golf, e.g. [How many arguments were passed? in 5 bytes of machine code after `_start`](//codegolf.stackexchange.com/a/162070). Normal code should set registers as needed. It just turns out by chance that `_NR_read` is 0 so this wasn't actually the only bug. – Peter Cordes Apr 26 '18 at 16:15

Basic input with x64 assembly code

3 Answers3

Related