Is there a potential bug in my `cat` function?

Question

I'm running this program with a Mac computer. It has been 7 months after creating this function and someone is arguing that I'm not properly recreating my cat function. I wanted to know why wouldn't work if I'm calling other functions. My output is correct, but I don't remember why the reasoning of not working.

A while ago, I decided to recreate some of the functions but in NASM. To test my knowledge, I wanted to recreate the cat call that takes in the file descriptor from a source file path by using the open function. Then call my cat(fd) assembly function. The output seems satisfying, but the labeling doesn't seem right in my defence.

Here is my assembly cat function file cat.s:

[bits 64]

global cat
%define SYS_READ 0x2000003
%define SYS_WRITE 0x2000004
%define STDOUT 0x01
%define BUFF_SIZE 0xff

section         .bss
    buffer      resb        BUFF_SIZE       ; unitialised storage space, basically reserving BUFF_SIZE bytes

section         .text

; int               my_cat(int fd);
_my_cat:
    xor     rax, rax
    push    rbp
    mov     rbp, rsp

    .read:
        push    rdi                 ; push rdi stack first before we start reading
        lea     rsi, [rel buffer]
        mov     rdx, BUFF_SIZE
        mov     rax, SYS_READ       ; read
        syscall
        jc      end                 ; jump carry
        cmp     rax, 0
        jle     end

    .write:                         ; write the message indicating end of file write
        mov     rdi, STDOUT         ; output fd
        mov     rdx, rax            ; store the destination of rax
        mov     rax, SYS_WRITE      ; write
        syscall
        pop     rdi                 ; take out our initial rdi stack
        jmp     .read               ; read again

    end:
        mov     rsp, rbp
        pop     rbp
        ret

Then this is the test file I'm running after open() in my hello.s file:

[bits 64]

global my_hello

section .data
    hello_world db 'Hello World!', 0x0a

section .text

%define SYS_WRITE 0x2000004

%define SYS_EXIT 0x2000001

_my_hello:
    mov     rax, SYS_WRITE      ; syscall write
    mov     rdi, 1              ; stdout fd (where will it write?)
    mov     rsi, hello_world    ; string address (where does it start?)
    mov     rdx, 20             ; string length in bytes (how many bytes to write?)
    syscall                     ; system call

    mov     rax, SYS_EXIT       ; exit system call
    xor     rdi, 0              ; 0 can be replaced with rdi
    syscall

main.c file:

extern  int             my_hello(void);
extern  int             my_cat(int fd);

int main(void)
{
    printf("testing my_cat: \n");
    fd = open("src/my_hello.s", O_RDONLY);
    ft_cat(fd);

    return (0);
}

I have the correct expected value, which it should be the content of hello.s file. The only part I'm not understanding is that I'm not using the write function properly because I don't take the 3 parameters (descriptor, the content of the string, nbuff_size). Help me to get learn't. For example: how can I see what is doing behind the scene when it's compiled (like using the lldb for my cat.s?) Moving my registers around doesn't help me to understand more.

You can use `strace` to see system calls. Code looks correct, not sure what your exact problem is. `cat` doesn't need 3 parameters because it fills in the buffer and the count itself. — Jester, Feb 12 '19 at 00:00
Actually from the syscall numbers it seems you are maybe on mac not linux, that also has `strace` equivalent called `dtruss` — Jester, Feb 12 '19 at 00:07
When I try to run `dtruss`, it requires privileges which I don't have access to it. — Zeid Tisnes, Feb 12 '19 at 00:13
instead of `xor rax, rax` just [use `xor eax, eax` which is shorter](https://stackoverflow.com/q/33666617/995714). And `cmp rax, 0` should be changed to [`test rax, rax`](https://stackoverflow.com/q/33721204/995714) — phuclv, Feb 12 '19 at 00:15
So it seems that the problem was that I was using two `syscalls` in my `cat.s` function, which can cause a problem in bigger files. How can I reduce this into one `syscall`? — Zeid Tisnes, Feb 12 '19 at 00:24
That is not a problem. You need one syscall for reading and another for writing. Also you are reusing a 255 byte buffer in a loop, so the size of the source file doesn't matter. — Jester, Feb 12 '19 at 00:26
@phuclv of course `rax` doesn't even need to be zeroed there, it's overwritten anyway and there are more places where 32 bit registers can be used instead of 64. — Jester, Feb 12 '19 at 00:27
So basically I forgot to point out that I'm popping my result later, which in fact it is correct. This post can be closed now or I will delete it in the next 24 hours. — Zeid Tisnes, Feb 12 '19 at 00:37

Peter Cordes · Answer 1 · 2019-02-12T01:18:38.190

Your tiny buffer makes this super inefficient (kernel/user transition ever 255 bytes). At least 8kiB would be much better, maybe 32 to 64k. (Fits in L2 cache size on typical modern x86.)

You might also need to handle read or write returning EINTR (interrupted by signal before any data read), for correctness in case the user hits control-z at the right moment. I'm not sure exactly how the system-call semantics work here, or of that's only for signals with handlers. But if this is supposed to be a function, not a whole program, you can't assume that the caller didn't set up any handlers.

At least you are using the return value of read as the length for write, so short reads (that return early but with non-zero length) are handled correctly.

I guess it's safe to assume/require that the input FD, and stdout, weren't opened with O_NONBLOCK, because you don't handle EWOULDBLOCK / EAGAIN. But that's normal and not a bug in cat(), just part of its contract.

Code quality: as @phuclv pointed out, you're wasting code bytes with xor rax,rax instead of xor eax,eax. And that's pointless anyway, because mov eax, SYS_READ already overwrites the whole RAX.

push rdi is also weird. Use a register, like r8 or something to stash the fd function arg. syscall only clobbers RCX, R11, and the return value in RAX. You don't need to touch user-space stack memory between system calls; that could lead to extra TLB misses or page faults depending on the kernel's Meltdown mitigation strategy. (push/pop is the smallest code-size option, though.)

Does OS X not have a fd-to-fd copy system call for this?

Linux has sendfile(2) that copies between FDs in kernel space, avoiding copying the data to user-space and back. (It was originally designed for zero-copy web / file servers to send data from files to sockets, and in really old kernels (before 2.6.33) out_fd had to be a socket. But that was years ago.) Anyway, arbitrarily large copy sizes without allocating / page-faulting any user-space memory, and saving a copy.

For files specifically, there's also copy_file_range(2), which gives the kernel / filesystem drivers the opportunity to do stuff like NFS server-side copying, or making copy-on-write reflinks with systems that support that. (Sendfile might also be able to do that, but the man page doesn't mention it.)

ssize_t sendfile(int out_fd, int in_fd, off_t *offset, size_t count);

ssize_t copy_file_range(int fd_in, loff_t *off_in,
                           int fd_out, loff_t *off_out,
                           size_t len, unsigned int flags);

The thorough answer is appreciated. Although, it was difficult to understand your response. Looking at my code, I did a few changes like the `test rax, rax`. Also, I like the fact of how much better my code can be improved with tiny modifications. However, I'm not sure how to properly do the changes due to lack of knowledge. I'm Googling around to have a better understanding of what you mentioned above. Your feedback was awesome! — Zeid Tisnes, Feb 12 '19 at 03:22

Is there a potential bug in my `cat` function?

1 Answers1