4

I am new to the assembly language, and I have to make an implementation of read function using assembly language x64 in MAC. so far this is what I did :

;;;;;;ft_read.s;;;;;;

global _ft_read:
section .text
extern ___error

_ft_read:
    mov rax, 0x2000003 ; store syscall value of read on rax 
    syscall            ; call read and pass to it rdi , rsi, rdx  ==> rax read(rdi, rsi, rdx)
    cmp rax, 103       ; compare rax with 103 by subtracting 103 from rax ==> rax - 103
    jl _ft_read_error  ; if the result of cmp is less than 0 then jump to _ft_read_error
    ret                ; else return the rax value which is btw the return value of syscall

_ft_read_error:
    push rax
    call ___error
    pop rcx
    mov [rax], rcx
    mov rax, -1
    ret

as you can see above, I call read with syscall, and then I compare the returned value of read syscall that stored in rax with 103, I will explain why I compare it with 103 but before that, let me explain something else, which is errno (man page of mac), this is what is written in the manual page about errno:

When a system call detects an error, it returns an integer value indicat-ing indicating ing failure (usually -1) and sets the variable errno accordingly. <This allows interpretation of the failure on receiving a -1 and to take action accordingly.> Successful calls never set errno; once set, it remains until another error occurs. It should only be examined after an error. Note that a number of system calls overload the meanings of these error numbers, and that the meanings must be interpreted according to the type and circumstances of the call.

The following is a complete list of the errors and their names as given in <sys/errno.h>.

0 Error 0. Not used.

1 EPERM Operation not permitted. An attempt was made to perform an operation limited to processes with appropriate privileges or to the owner of a file or other resources.

2 ENOENT No such file or directory. A component of a specified pathname did not exist, or the pathname was an empty string.

..................................................I'll skip this part (I wrote this line btw)..................................................

101 ETIME STREAM ioctl() timeout. This error is reserved for future use.

102 EOPNOTSUPP Operation not supported on socket. The attempted operation is not supported for the type of socket referenced; for example, trying to accept a connection on a datagram socket.

and as I understand and after I debugged a lot of time using lldb, I noticed that syscall returns one of those numbers that are shown in the errno man page, for example when I pass a bad file descriptor, in my ft_read function using the below main.c code like this :

int bad_file_des = -1337;// a file descriptor which it doesn't exist of course, you can change it with -42 as you like
ft_read(bad_file_des, buff, 300);

our syscall returns 9 which is stored in rax so I compare if rax < 103 (because errno values are from 0 to 102) then jump to ft_read_error because that's what it should do.

Well everything works as I planned but there is a problem which came from nowhere, when I open an existing file and I pass it's file descriptor to my ft_read function as shown in the below main.c, our read syscall returns "the number of bytes read is returned", this is what read syscall returns as described on the manual:

On success, the number of bytes read is returned (zero indicates end of file), and the file position is advanced by this number. It is not an error if this number is smaller than the number of bytes requested; this may happen for example because fewer bytes are actually available right now (maybe because we were close to end-of- file, or because we are reading from a pipe, or from a terminal), or because read() was interrupted by a signal. See also NOTES.

On error, -1 is returned, and errno is set appropriately. In this case, it is left unspecified whether the file position (if any) changes.

and in my main that it works pretty fine, I pass to my ft_read function a good file descriptor, a buffer to store the data, and 50 bytes to read, so syscall will return 50 that stored in rax, then the comparison makes it's job >> rax = 50 < 103 then it will jump to ft_read_error even that there is no error, just because 50 is one of those errno error numbers which is not in this case.

someone suggests to use jc (jump if carry flag is set) rather than jl (jump if less) as shown in the code below :

;;;;;;ft_read.s;;;;;;

global _ft_read:
section .text
extern ___error

_ft_read:
    mov rax, 0x2000003 ; store syscall value of read on rax 
    syscall            ; call read and pass to it rdi , rsi, rdx  ==> rax read(rdi, rsi, rdx)
                       ; deleted the cmp
    jc _ft_read_error  ; if carry flag is set then jump to _ft_read_error
    ret                ; else return the rax value which is btw the return value of syscall

_ft_read_error:
    push rax
    call ___error
    pop rcx
    mov [rax], rcx
    mov rax, -1
    ret

and guess what, it works perfectly and errno returns 0 using my ft_read when there is no error, and it returns the appropriate error number when there is an error.

but the problem is that I don't know why the carry flag got set, when there is no cmp, does syscall set the carry flag when there is an error during the call, or there is another thing happening in the background? I want a detailed explanation about the relation between the syscall and carry flag, I am still new to assembly and I want to learn it so badly, and thanks in advance.

what is the relation between the syscall and carry flag and how syscall sets it?

this is my main.c function that I use to compile the assembly code above :

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/types.h>
#include <fcntl.h>
#include <errno.h>

ssize_t ft_read(int fildes, void *buf, size_t nbyte);

int     main()
{
    /*-----------------------------------------------------------------------*/
    ///////////////////////////////////////////////////////////////////////////
    /********************************ft_read**********************************/
    int     fd = open("./main.c", O_RDONLY);
    char    *buff = calloc(sizeof(char), 50 + 1);
    int     ret = ft_read(fd, buff, 50);

    printf("ret value = %d,  error value = %d : %s\n", ret, errno, strerror(errno));
    //don't forget to free ur buffer bro, this is just a test main don't be like me.
    return (0);
}
Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
Holy semicolon
  • 868
  • 1
  • 11
  • 27
  • 3
    "does syscall set the carry flag when there is an error during the call," Yes. It sets CF on error and clears CF on success. This is an assembly-specific calling convention. Used to be much more common. There is even [the instruction `stc`](https://ulukai.org/ecm/doc/insref.htm#insSTC) and [its opposite `clc`](https://ulukai.org/ecm/doc/insref.htm#insCLC) which were used by near called functions to indicate error or success through the Carry Flag. (Interrupt function handlers need to modify the `iret` stack frame to set or clear the caller's CF on the stack.) – ecm Nov 13 '20 at 11:56
  • 2
    Beware that the systemcalls you see in man are wrappers around the real systemcalls done with `syscall`. `errno` is a CRT thing. At least that's how it works under Linux, it may be the same under Darwin. Linux's syscalls return a negative value on error and errno is set from these negative values. You should check the Darwin's syscall **ABI** and not man. – Margaret Bloom Nov 13 '20 at 12:21
  • @MargaretBloom Thanks for the information, I checked the return value of read syscall in Darwin's syscall from this [site](https://john-millikin.com/unix-syscalls#syscalls-by-os) and it is the same in the return value. – Holy semicolon Nov 13 '20 at 12:34

1 Answers1

6

Part of the confusion is that the term "system call" is used for two things that are really different:

  1. The actual request to the kernel to read from a file, as invoked by executing the syscall instruction.

  2. The C function read(), provided by the userspace C library as a way for C programs to conveniently access the functionality of #1.

The man page documents how to use #2, but in assembly you are working with #1. The overall semantics are the same, but the details of how you access them are different.

In particular, the C function (#2) follows the convention that errors are indicated by returning -1 from the function and setting the variable errno. However, this is not a convenient way for #1 to indicate errors. errno is a global (or thread-local) variable located somewhere in the program's memory; the kernel doesn't know where, and it would be awkward to tell it, so the kernel can't easily write this variable directly. It's simpler for the kernel to return error codes some other way, and leave it up to the C library to set the errno variable.

The convention that BSD-based operating systems generally follow is that the kernel system call (#1) will set or clear the carry flag according to whether an error occurred. If no error occurred, rax contains the system call's return value (here, the number of bytes read); if an error did occur, eax contains the error code (it's normally a 32-bit value, since errno is an int). So if you are writing in assembly, that is what you should expect to see.

As to how the kernel manages to set/clear the carry flag, when the system call is complete, the kernel executes the sysret instruction to transfer control back to user space. One of the functions of this instruction is to restore the rflags register from r11. The kernel will have saved your process's original rflags when the system call began, so it merely has to set or clear the low-order bit (that's where the carry flag is) in this 64-bit value before or after loading it into r11 in preparation for sysret. Then when your process continues execution with the instruction following your syscall, the carry flag will be in the corresponding state.

The cmp instruction is certainly one of the ways that an x86 CPU can set the carry flag, but it's by no means the only way. And even if it were, it shouldn't surprise you not to see that code in the userspace program, since it's the kernel that determines how it is set.

In order to implement #2, the C library's read() function needs to interface between the kernel's convention (#1) and what the C programmer is expecting (#2), so they have to write some code to check the carry flag and populate errno if needed. Their code for this function could look something like the following:

    global read
read:
    mov rax, 0x2000003
    ; fd, buf, count are in rdi, rsi, rdx respectively
    syscall
    jc read_error
    ; no error, return value is in rax which is where the C caller expects it
    ret
read_error:
    ; error occurred, eax contains error code
    mov [errno], eax
    ; C caller expects return value of -1
    mov rax, -1 
    ret

There is some more info at 64-bit syscall documentation for MacOS assembly. I wish I could cite some more authoritative documentation, but I don't know where to find it. What's here seems to be "common knowledge".

Nate Eldredge
  • 48,811
  • 6
  • 54
  • 82