7

I'm learning assembly.
And as per my usual learning steps with any new language I pick up, I've arrived at networking with assembly.

Which, sadly isn't going that well as I've pretty much failed at step 0, which would be getting a socket through which communication can begin.

The assembly code should be roughly equal to the following C code:

#include <stdio.h>
#include <sys/socket.h>

int main(){
    int sock;
    sock = socket(AF_INET, SOCK_STREAM, 0);
}

(Let's ignore the fact that it's not closing the socket for now.)

So here's what I did thus far:

  • Checked the manual. Which would imply that I need to make a socketcall() this is all good and well. The problem starts with that it would need an int that describes what sort of socketcall it should make. The calls manpage isn't helping much with this either as it only describes that:

    On a some architectures—for example, x86-64 and ARM—there is no socketcall() system call; instead socket(2), accept(2), bind(2), and so on really are implemented as separate system calls.

  • Yet there are no such calls in the original list of syscalls - and as far as I know the socket(), accept(), bind(), listen(), etc. are calls from libnet and not from the kernel. This got me utterly confused so I've decided to compile the above C code and check up on it with strace. This yielded the following:

    socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 3
    
  • While that didn't got me any closer to knowing what socket() is it did explain it's arguments. For witch I don't seem to find the proper documentation (again). I thought that PF_INET, SOCK_STREAM, IPPROTO_IP would be defined in <sys/socket.h> but my grep-ing for them didn't seem to find anything of use. So I decided to just wing it by using gdb in tandem with disass main to find the values. This gave the following output:

    Dump of assembler code for function main:
       0x00000000004004FD  <+0>:  push  rbp
       0x00000000004004FE  <+1>:  mov   rbp,rsp
       0x0000000000400501  <+4>:  sub   rsp,0x10
       0x0000000000400505  <+8>:  mov   edx,0x0
       0x000000000040050A <+13>:  mov   esi,0x1
       0x000000000040050F <+18>:  mov   edi,0x2
       0x0000000000400514 <+23>:  call  0x400400 <socket@plt>
       0x0000000000400519 <+28>:  mov   DWORD PTR [rbp-0x4],eax
       0x000000000040051C <+31>:  leave
       0x000000000040051D <+32>:  ret
    End of assembler dump.
    
  • In my experience this would imply that socket() gets it's parameters from EDX (PF_INET), ESI (SOCK_STREAM), and EDI (IPPROTO_IP). Which would be odd for a syscall (as the convention with linux syscalls would be to use EAX/RAX for the call number and other registers for the parameters in increasing order, eg. RBX, RCX, RDX ...). The fact that this is being CALL-ed and not INT 0x80'd would also imply that this is not in fact a system call, but rather something that’s being called from a shared object. Or something.

  • But then again. Passing arguments in registers is very odd for something that's CALL-ed. Normally as far as I know, argument for called things should be PUSH-ed onto the stack, as the compiler can't know what registers they would try to use.

  • This behavior becomes even more curious when checking the produced binary with ldd:

    linux-vdso.so.1 (0x00007fff4a7fc000)
    libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f56b0c61000)
    /lib64/ld-linux-x86-64.so.2 (0x00007f56b1037000)
    
  • There doesn't appear to be any networking library's linked.

And that's the point where I've ran out of ideas.

So I'm asking for the following:

  • A documentation that describes the x86-64 linux kernel's actual system calls and their associated numbers. (Preferably as a header file for C.)
  • The header files that define PF_INET, SOCK_STREAM, IPPROTO_IP as it really bugs me that I wasn't able to find them on my own system.
  • Maybe a tutorial for networking in assembly on x86-64 Linux. (For x86-32 it's easy to find material, but for some reason I came up empty with the 64 bits stuff.)
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Wolfer
  • 891
  • 2
  • 12
  • 21
  • You aren't finding the defines in the header you include with grep because it in turn includes a less portable header file which provides the actual value (probably several such layers). Most processors with a moderate to large number of registers are traditionally programmed via ABIs that put the first few arguments in specified registers for efficiency, before resorting to the stack for the remainder. – Chris Stratton Nov 12 '14 at 16:53
  • 2
    'networking with assembly' - did you really pick that as a learning exercise? Is someone holding you at gunpoint? – Martin James Nov 12 '14 at 17:33

3 Answers3

7

The 64 bit calling convention does use registers to pass arguments, both in user space and to system calls. As you have seen, the user space convention is rdi,rsi, rdx, rcx, r8, r9. For system calls, r10 is used instead of rcx which is clobbered by the syscall instruction. See wikipedia or the ABI documentation for more details.

The definitions of the various constants are hidden in header files, which are nevertheless easily found via a file system search assuming you have the necessary development packages installed. You should look in /usr/include/x86_64-linux-gnu/bits/socket.h and /usr/include/linux/in.h.

As for a system call list, it's trivial to google one, such as this. You can also always look in the kernel source of course.

Jester
  • 56,577
  • 4
  • 81
  • 125
  • The call numbers are defined in `asm/unistd_64.h`. e.g. `/usr/include/asm/unistd_64.h` on Arch Linux, or wherever else your distro puts it. The man pages for system calls document any difference between the libc wrapper and the kernel interface, e.g. how `getpriority` (https://man7.org/linux/man-pages/man2/getpriority.2.html#NOTES) encodes "nice" values to avoid conflict with `-errno` codes, and how `brk` works. – Peter Cordes Aug 01 '23 at 19:30
  • As you say, given the man pages, call numbers, and calling-convention ([What are the calling conventions for UNIX & Linux system calls (and user-space functions) on i386 and x86-64](https://stackoverflow.com/q/2535989)), you don't need a table of registers for each system call; there's a consistent pattern. The tables are presumably auto-generated from headers and the call-number table. – Peter Cordes Aug 01 '23 at 19:31
3

socket.asm

; Socket

; Compile with: nasm -f elf socket.asm

; Link with (64 bit systems require elf_i386 option): ld -m elf_i386 socket.o -o socket

; Run with: ./socket
 
%include    'functions.asm'
 
SECTION .text
global  _start
 
_start:
 
    xor     eax, eax            ; init eax 0
    xor     ebx, ebx            ; init ebx 0
    xor     edi, edi            ; init edi 0
    xor     esi, esi            ; init esi 0
 
_socket:
 
    push    byte 6              ; push 6 onto the stack (IPPROTO_TCP)
    push    byte 1              ; push 1 onto the stack (SOCK_STREAM)
    push    byte 2              ; push 2 onto the stack (PF_INET)
    mov     ecx, esp            ; move address of arguments into ecx
    mov     ebx, 1              ; invoke subroutine SOCKET (1)
    mov     eax, 102            ; invoke SYS_SOCKETCALL (kernel opcode 102)
    int     80h                 ; call the kernel
 
    call    iprintLF            ; call our integer printing function (print the file descriptor in EAX or -1 on error)
 
_exit:
 
    call    quit                ; call our quit function

more docs...

Krishna
  • 6,107
  • 2
  • 40
  • 43
Eyni Kave
  • 1,113
  • 13
  • 23
0

this is for x86 system. if you want use for x86_64 system change x86 register to x86_64. for example change 'eax' to 'rax' or 'esp' to 'rsp'. and change syscall value in eax(rax), see https://chromium.googlesource.com/chromiumos/docs/+/master/constants/syscalls.md

[bits 32]

global _start
section .data
   msg: db "Socket Failed To Create!",0xa,0
   len: equ $-msg

   msg1: db "Socket Created",0xa,0
   len1: equ $-msg1

   msg2: db "Recv Or Send Failed",0xa,0
   len2: equ $-msg2

   msg3: db "Shutdown Socket Failed",0xa,0
   len3: equ $-msg3

   DATASIZE:        equ 5

   SOCK_STREAM:     equ 1
   AF_INET:         equ 2
   AF_INET:         equ 2
   INADDR_ANY:      equ 0
   MSG_WAITALL:     equ 0x100
   MSG_DONTWAIT:    equ 0x40
   SHUT_RDWR:       equ 2

    SYS_SOCKET:     equ 1       ; sys_socket(2)
    SYS_BIND:       equ 2       ; sys_bind(2)
    SYS_CONNECT:    equ 3       ; sys_connect(2)
    SYS_LISTEN:     equ 4       ; sys_listen(2)
    SYS_ACCEPT:     equ 5       ; sys_accept(2)
    SYS_GETSOCKNAME:equ 6       ; sys_getsockname(2)
    SYS_GETPEERNAME:equ 7       ; sys_getpeername(2)
    SYS_SOCKETPAIR: equ 8       ; sys_socketpair(2)
    SYS_SEND:       equ 9       ; sys_send(2)
    SYS_RECV:       equ 10      ; sys_recv(2)
    SYS_SENDTO:     equ 11      ; sys_sendto(2)
    SYS_RECVFROM:   equ 12      ; sys_recvfrom(2)
    SYS_SHUTDOWN:   equ 13      ; sys_shutdown(2)
    SYS_SETSOCKOPT: equ 14      ; sys_setsockopt(2)
    SYS_GETSOCKOPT: equ 15      ; sys_getsockopt(2)
    SYS_SENDMSG:    equ 16      ; sys_sendmsg(2)
    SYS_RECVMSG:    equ 17      ; sys_recvmsg(2)
    SYS_ACCEPT4:    equ 18      ; sys_accept4(2)
    SYS_RECVMMSG:   equ 19      ; sys_recvmmsg(2)
    SYS_SENDMMSG:   equ 20      ; sys_sendmmsg(2)

struc sockaddr_in, -0x30
    .sin_family:    resb 2  ;2bytes
    .sin_port:      resb 2  ;2bytes
    .sin_addr:      resb 4  ;4bytes
    .sin_zero:      resb 8  ;8bytes
endstruc

struc socket, -0x40
    .socketfd       resb 4
    .connectionfd   resb 4
    .count          resb 4
    .data           resb DATASIZE
endstruc

section .text

_start:
    push ebp
    mov ebp, esp
    sub esp, 0x400 ;1024byte

    xor edx, edx    ;or use cdq
    ;
    ; int socket(int domain, int type, int protocol);
    ; domain: The domain argument specifies a communication domain
    ; 
    push edx                        ; Push protocol
    push dword SOCK_STREAM          ; Push type
    push dword AF_INET              ; Push domain
    mov ecx, esp                    ; ECX points to args
    mov ebx, SYS_SOCKET             ;
    mov eax, 0x66                   ; socketcall()
    int 0x80

    cmp eax, 0
    jl .socket_failed
    mov [ebp + socket.socketfd], eax

    ;
    ; fill struct sockaddr_in serv_addr;
    ;
    mov word [ebp + sockaddr_in.sin_family], AF_INET
    mov word [ebp + sockaddr_in.sin_port], 0x3905
    mov dword [ebp + sockaddr_in.sin_addr], INADDR_ANY

    push dword [ebp + sockaddr_in.sin_addr]
    push word  [ebp + sockaddr_in.sin_port]
    push word  [ebp + sockaddr_in.sin_family]

    mov ecx, esp

    ;
    ; int bind(int sockfd, const struct sockaddr *addr, socklen_t addrlen);
    ;
    push byte 0x10                  ; sizeof(struct sockaddr)
    push ecx                        ; pointer struct sockaddr
    push dword [ebp + socket.socketfd]
    mov ecx, esp                    ; ECX points to args
    mov ebx, SYS_BIND               ;
    mov eax, 0x66
    int 0x80

    cmp eax, 0
    jne .socket_failed


    ;
    ;   int listen(int sockfd, int backlog);
    ;
    push dword 0x10
    push dword [ebp + socket.socketfd]
    mov ecx, esp
    mov ebx, SYS_LISTEN
    mov eax, 0x66
    int 0x80
    cmp eax, 0
    jne .socket_failed



    ;
    ; int accept(int sockfd, struct sockaddr *addr, socklen_t *addrlen);
    ;
    xor ebx, ebx
    push ebx
    push ebx
    push dword [ebp + socket.socketfd]
    mov ecx, esp
    mov ebx, SYS_ACCEPT
    mov eax, 0x66
    int 0x80
    cmp eax, -1
    je .socket_failed
    mov [ebp + socket.connectionfd], eax

    mov dword [ebp + socket.count], 0
.again:
    lea edi, [ebp + socket.data]
    mov ecx, DATASIZE
    mov eax, 0
    rep stosd

    lea eax, [ebp + socket.data]
    ;
    ; ssize_t recv(int sockfd, const void *buf, size_t len, int flags);
    ;
    push dword MSG_WAITALL
    push dword DATASIZE
    push eax
    push dword [ebp + socket.connectionfd]
    mov ecx, esp
    mov ebx, SYS_RECV
    mov eax, 0x66
    int 0x80
    cmp eax, 0
    jle .recv_or_send_failed

    mov edx, eax
    lea ecx, [ebp + socket.data]
    call printk

    inc dword [ebp + socket.count]
    cmp dword [ebp + socket.count], 5
    jle  .again
.break:
    ;
    ; int shutdown(int sockfd, int how);
    ;
    push dword SHUT_RDWR
    push dword [ebp + socket.socketfd]
    mov ecx, esp
    mov ebx, SYS_SHUTDOWN
    mov eax, 0x66
    int 0x80
    cmp eax, 0
    jne .shutdown_failed

    ;
    ; int close(int fd)
    ;
    mov ebx, [ebp + socket.connectionfd]
    mov eax, 0x06
    int 0x80
    cmp eax, 0
    jne .shutdown_failed

    jmp .success

.shutdown_failed:
    mov edx, len3
    mov ecx, msg3
    call printk
    jmp .end
.recv_or_send_failed:
    mov edx, len2
    mov ecx, msg2
    call printk
    jmp .end
.socket_failed:

    mov edx, len
    mov ecx, msg
    call printk
    jmp .end

.success:
    mov edx, len1
    mov ecx, msg1
    call printk
    jmp .end
.end:
   leave

   mov     ebx,0               ;first syscall argument: exit code
   mov     eax,1               ;system call number (sys_exit)
   int     0x80                ;call kernel

   ret

   
; EDX: message length
; ECX: pointer to message to write
printk:
    pusha
    mov     ebx,1               ;first argument: file handle (stdout)
    mov     eax,4               ;system call number (sys_write)
    int     0x80                ;call kernel
    popa
    ret
Mahdi Mohammadi
  • 239
  • 2
  • 7
  • 2
    porting to x86-64 takes more than changing register names and call numbers. As Jester's answer says, the `syscall` 64-bit system-calling convention uses different arg registers than `int 0x80` 32-bit calls. ([What happens if you use the 32-bit int 0x80 Linux ABI in 64-bit code?](https://stackoverflow.com/q/46087730)). Also, struct layouts may be different for any that involve pointers or `long`. – Peter Cordes May 31 '21 at 21:49
  • 2
    `push` in 64-bit mode can only push 2 or 8 bytes, not 4, so you can't construct your struct on the stack exactly the same way in a 64-bit version. (NASM unfortunately allows `push dword` without complaint, I guess treating it as a suggestion for the immediate encoding even without `strict dword`, but the instruction is still a push qword.) – Peter Cordes May 31 '21 at 21:50