0

I came across arguably the smallest HTTP server in docker (written in assembly), and I would love to see it in action!

I think they took the code from https://gist.github.com/DGivney/5917914 :

section .text
global _start

_start:
  xor eax, eax              ; init eax 0
  xor ebx, ebx              ; init ebx 0
  xor esi, esi              ; init esi 0
  jmp _socket               ; jmp to _socket

_socket_call:
  mov al, 0x66              ; invoke SYS_SOCKET (kernel opcode 102)
  inc byte bl               ; increment bl (1=socket, 2=bind, 3=listen, 4=accept)
  mov ecx, esp              ; move address arguments struct into ecx
  int 0x80                  ; call SYS_SOCKET
  jmp esi                   ; esi is loaded with a return address each call to _socket_call

_socket:
  push byte 6               ; push 6 onto the stack (IPPROTO_TCP)
  push byte 1               ; push 1 onto the stack (SOCK_STREAM)
  push byte 2               ; push 2 onto the stack (PF_INET)
  mov esi, _bind            ; move address of _bind into ESI
  jmp _socket_call          ; jmp to _socket_call

_bind:
  mov edi, eax              ; move return value of SYS_SOCKET into edi (file descriptor for new socket, or -1 on error)
  xor edx, edx              ; init edx 0
  push dword edx            ; end struct on stack (arguments get pushed in reverse order)
  push word 0x6022          ; move 24610 dec onto stack
  push word bx              ; move 1 dec onto stack AF_FILE
  mov ecx, esp              ; move address of stack pointer into ecx
  push byte 0x10            ; move 16 dec onto stack
  push ecx                  ; push the address of arguments onto stack
  push edi                  ; push the file descriptor onto stack

  mov esi, _listen          ; move address of _listen onto stack
  jmp _socket_call          ; jmp to _socket_call

_listen:
  inc bl                    ; bl = 3
  push byte 0x01            ; move 1 onto stack (max queue length argument)
  push edi                  ; push the file descriptor onto stack
  mov esi, _accept          ; move address of _accept onto stack
  jmp _socket_call          ; jmp to socket call

_accept:
  push edx                  ; push 0 dec onto stack (address length argument)
  push edx                  ; push 0 dec onto stack (address argument)
  push edi                  ; push the file descriptor onto stack
  mov esi, _fork            ; move address of _fork onto stack
  jmp _socket_call          ; jmp to _socket_call

_fork:
  mov esi, eax              ; move return value of SYS_SOCKET into esi (file descriptor for accepted socket, or -1 on error)
  mov al, 0x02              ; invoke SYS_FORK (kernel opcode 2)
  int 0x80                  ; call SYS_FORK
  test eax, eax             ; if return value of SYS_FORK in eax is zero we are in the child process
  jz _write                 ; jmp in child process to _write

  xor eax, eax              ; init eax 0
  xor ebx, ebx              ; init ebx 0
  mov bl, 0x02              ; move 2 dec in ebx lower bits
  jmp _listen               ; jmp in parent process to _listen

_write:
  mov ebx, esi              ; move file descriptor into ebx (accepted socket id)
  push edx                  ; push 0 dec onto stack then push a bunch of ascii (http headers & reponse body)
  push dword 0x0a0d3e31     ; [\n][\r]>1
  push dword 0x682f3c21     ; h/<!
  push dword 0x6f6c6c65     ; ello
  push dword 0x683e3148     ; H<1h
  push dword 0x3c0a0d0a     ; >[\n][\r][\n]
  push dword 0x0d6c6d74     ; [\r]lmt
  push dword 0x682f7478     ; h/tx
  push dword 0x6574203a     ; et :
  push dword 0x65707954     ; epyT
  push dword 0x2d746e65     ; -tne
  push dword 0x746e6f43     ; tnoC
  push dword 0x0a4b4f20     ; \nKO
  push dword 0x30303220     ; 002
  push dword 0x302e312f     ; 0.1/
  push dword 0x50545448     ; PTTH
  mov al, 0x04              ; invoke SYS_WRITE (kernel opcode 4)
  mov ecx, esp              ; move address of stack arguments into ecx
  mov dl, 64                ; move 64 dec into edx lower bits (length in bytes to write)
  int 0x80                  ; call SYS_WRITE

_close:
  mov al, 6                 ; invoke SYS_CLOSE (kernel opcode 6)
  mov ebx, esi              ; move esi into ebx (accepted socket file descriptor)
  int 0x80                  ; call SYS_CLOSE
  mov al, 6                 ; invoke SYS_CLOSE (kernel opcode 6)
  mov ebx, edi              ; move edi into ebx (new socket file descriptor)
  int 0x80                  ; call SYS_CLOSE

_exit:
  mov eax, 0x01             ; invoke SYS_EXIT (kernel opcode 1)
  xor ebx, ebx              ; 0 errors
  int 0x80                  ; call SYS_EXIT

I can assemble and link the code without any errors.

But when I run it, nothing seems to happen.

What do I need to do in order to see the output from the assembly HTTP server in my browser?

Potherca
  • 13,207
  • 5
  • 76
  • 94
  • Does building a (minimal) Dockerfile work better than `docker import` here? Does the container actually start up? What address is the server listening on? – David Maze May 08 '21 at 09:50
  • To create a minimal dockerfile, I would need to build a binary from the assembly code, so this _seems_ easier? The container does start up. I have no idea what address or port the server is listening to... – Potherca May 08 '21 at 10:53
  • I don't see why you would need docker. First, correct that shi**y code by replacing `push word bx` with `push word 2` then assemble it (`nasm -felf32 httpd.asm -o httpd.o`) and link it (`ld -melf_i386 httpd.o -o http`). Then run it `./httpd`. It will respond on port `8800` (as you can easily see from the bind call and remembering that the port in big-endian order). The server will respond but badly. Chromium will keep loading the page. Also "functional" is definitively not what this server is. And why would one want to use `mov esi, ret_addr /jmp` instead of `call`? – Margaret Bloom May 08 '21 at 13:41
  • @MargaretBloom Fair point. For the question I guess the docker part is irrelevant. The code was provided as-is by the linked repo. I've updated my answer. Your suggestion works, I get a response in my browser now. I've you're willing to type your comment as an answer, I'd accept it! – Potherca May 08 '21 at 13:56
  • @MargaretBloom: Yeah, last time I looked at this I noticed some possible optimizations to save more code-size (at least in the C version, which used a craptastic hand-written asm `_start` and custom definitions for syscall wrapper functions, instead of just inlining the syscalls with inline asm). Didn't get around to making a pull request yet, though. It's funny to see `inc byte bl` to create a `1` in code-golf - `inc ebx` is smaller. (Presumably EBX was xor-zeroed in all code paths leading to this? I hope?) – Peter Cordes May 08 '21 at 14:11
  • @MargaretBloom: But anyway, I think the reason for not using `call` there is that they're trying to pass around a struct they created via push, and want to allow `mov ecx, esp` instead of `lea ecx, [esp+4]`? Or they're not *that* good at asm and didn't think of that :P – Peter Cordes May 08 '21 at 14:13
  • @MargaretBloom: [How does this C program without libc work?](https://stackoverflow.com/q/66851039) was the previous question about the same repo. Apparently I did at least push my work-in-progress branch to github, and linked it in my answer. – Peter Cordes May 08 '21 at 14:15
  • @Potherca Sorry, I have been too harsh in my comment :) I don't think I can add any value to this question by simply answering how to reach the server, but maybe somebody will write a more interesting answer :) – Margaret Bloom May 08 '21 at 14:20
  • @PeterCordes How is code size measured? Just the code? The whole ELF? Maybe just hoisting all those structures in the data section (and using a lower alignment) could improve the "code size". I don't know why but that code just hit my nerves :P – Margaret Bloom May 08 '21 at 14:24
  • For me the end metric would be the size of the docker image... – Potherca May 08 '21 at 14:27
  • @MargaretBloom Don't worry about it. I couldn't even figure out where in the code the port was assigned, let alone which port number it was! – Potherca May 08 '21 at 14:28
  • 1
    @MargaretBloom: size of the entire ELF image. They use a `build.sh` with `gcc -Os` and `strip` and stuff, and the final binary has no `.data` section / segment, so IDK how much more space that would take. But yeah, it would probably pay for itself with the amount of push instructions in `_write`, each costing 1 extra byte beyond their immediate payload. The amount of `mov esi, label` / `jmp _socket_call` is pretty silly; probably not paying for itself vs. `call`. And `_socket_call` could have before `_start` to avoid jumping over it. – Peter Cordes May 08 '21 at 14:33
  • 1
    @MargaretBloom: Oh, this is a different mini httpd. Different source from https://github.com/Francesco149/nolibc-httpd - that one is for x86-64, and the pure-asm source in that repo is basically hand-tweaked disassembly output. And using an ELF64 version of http://muppetlabs.com/~breadbox/software/tiny/teensy.html - having `nasm -fbin` emit ELF program headers, instead of relying on `ld`. This one is intended to be assembled + linked normally into a 32-bit static executable. So nevermind that previous SO Q&A, that was a different toy httpd. – Peter Cordes May 08 '21 at 14:46
  • @Potherca Out of fun, I made a smaller (and I think more easy to use) version of httpd [here](https://github.com/margaretbloom/nash-f). – Margaret Bloom May 08 '21 at 20:19
  • @MargaretBloom IT.IS.AWESOME! – Potherca May 09 '21 at 11:06

1 Answers1

4

Seems to barely sort of work for me, although it is buggy as Margaret Bloom noticed. (It listens on a random port since it makes a bad bind syscall. Presumably passing the wrong number for sa_family)

After building / linking with nasm -felf32 / ld -melf_i386, I ran it under strace to see what it did.

$ strace ./httpd
execve("./httpd", ["./httpd"], 0x7ffde685ac10 /* 54 vars */) = 0
[ Process PID=615796 runs in 32 bit mode. ]
socket(AF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
bind(3, {sa_family=AF_UNIX, sun_path="\"`"}, 16) = -1 EAFNOSUPPORT (Address family not supported by protocol)
syscall_0xffffffffffffff66(0x4, 0xffd53c58, 0, 0x8049043, 0x3, 0) = -1 ENOSYS (Function not implemented)
syscall_0xffffffffffffff66(0x5, 0xffd53c4c, 0, 0x804904d, 0x3, 0) = -1 ENOSYS (Function not implemented)
syscall_0xffffffffffffff02(0x5, 0xffd53c4c, 0, 0xffffffda, 0x3, 0) = -1 ENOSYS (Function not implemented)
listen(3, 1)                            = 0
accept(3, NULL, NULL

The mov al, callnum trick to save bytes assumes the upper bytes of EAX are still 0. If they aren't (all-ones from a -errno return), the next few syscalls are with invalid call numbers. But eventually it does listen(3,1) and accept, so it is listening somewhere. I found its PID with ps, then used lsof to find out what port it was listening on:

$ lsof -p 615796
COMMAND    PID  USER   FD   TYPE   DEVICE SIZE/OFF  NODE NAME
httpd   615796 peter  cwd    DIR     0,55      940     1 /tmp
httpd   615796 peter  rtd    DIR     0,27      158   256 /
httpd   615796 peter  txt    REG     0,55     5412 56241 /tmp/httpd
httpd   615796 peter    0u   CHR   136,20      0t0    23 /dev/pts/20
httpd   615796 peter    1u   CHR   136,20      0t0    23 /dev/pts/20
httpd   615796 peter    2u   CHR   136,20      0t0    23 /dev/pts/20
httpd   615796 peter    3u  IPv4 86480691      0t0   TCP *:36047 (LISTEN)

Connecting to that port with nc (netcat) gets it to dump its fixed string payload and keep the connection open:

$ nc localhost 36047
HTTP/1.0 200 OK
Content-Type: text/html

<h1>PwN3d!</h1>
       CONTROL-C
$

Pointing Chromium at http://localhost:36047/ also loaded a page, but since the connection stays open it's still spinning waiting for more data.

The strace output after a few connections has grown to

accept(3, NULL, NULL)                   = 4
fork()                                  = 615904
listen(3, 1)                            = 0
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=615904, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
accept(3, NULL, NULL)                   = 5
fork()                                  = 615986
listen(3, 1)                            = 0
accept(3, NULL, NULL)                   = ? ERESTARTSYS (To be restarted if SA_RESTART is set)
--- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=615986, si_uid=1000, si_status=0, si_utime=0, si_stime=0} ---
accept(3, NULL, NULL

BTW, another minimal HTTPD, one that builds to an even smaller binary thanks to its build scripts, is https://github.com/Francesco149/nolibc-httpd. There's a C version (which calls handwritten asm wrappers for system calls, instead of just using inline asm). See How does this C program without libc work? about it.

As I mentioned there, I got the C version down from 1.2k to 992 bytes, by switching to clang -Oz and using inline asm for system calls, so the whole thing could be a leaf function. (Not needing to save/restore registers around anything.) https://github.com/pcordes/nolibc-httpd/commit/ad3a80b89b98379304f1525339fa71700bf1a15d

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • So basically, it is just really poorly written code? – Potherca May 08 '21 at 15:21
  • 2
    @Potherca: Yes, 100%. I don't think the i386 kernel ABI could have changed for the `bind` system call any time between 2013 and now, so I'm pretty confident this was simply always broken. There are other signs of clunkyness / poor design, like the `mov esi, symbol` / `jmp` which doesn't actually save space vs. using `call`, even taking other factors into account. And especially use of `inc bl` (2 bytes) instead of `inc ebx` (1 byte) is a pretty clear sign of inexperience with golfing for code-size at least. – Peter Cordes May 08 '21 at 15:25
  • I think that for my purposes it might be wiser to simply move to using nolibc-httpd. Well, at least I learned all sorts of things today! – Potherca May 08 '21 at 15:30