11

I'm trying to learn assembly -- x86 in a Linux environment. The most useful tutorial I can find is Writing A Useful Program With NASM. The task I'm setting myself is simple: read a file and write it to stdout.

This is what I have:

section  .text              ; declaring our .text segment
  global  _start            ; telling where program execution should start

_start:                     ; this is where code starts getting exec'ed

  ; get the filename in ebx
    pop   ebx               ; argc
    pop   ebx               ; argv[0]
    pop   ebx               ; the first real arg, a filename

  ; open the file
    mov   eax,  5           ; open(
    mov   ecx,  0           ;   read-only mode
    int   80h               ; );

  ; read the file
    mov     eax,  3         ; read(
    mov     ebx,  eax       ;   file_descriptor,
    mov     ecx,  buf       ;   *buf,
    mov     edx,  bufsize   ;   *bufsize
    int     80h             ; );

  ; write to STDOUT
    mov     eax,  4         ; write(
    mov     ebx,  1         ;   STDOUT,
  ; mov     ecx,  buf       ;   *buf
    int     80h             ; );

  ; exit
    mov   eax,  1           ; exit(
    mov   ebx,  0           ;   0
    int   80h               ; );

A crucial problem here is that the tutorial never mentions how to create a buffer, the bufsize variable, or indeed variables at all.

How do I do this?

(An aside: after at least an hour of searching, I'm vaguely appalled at the low quality of resources for learning assembly. How on earth does any computer run when the only documentation is the hearsay traded on the 'net?)

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
jameshfisher
  • 34,029
  • 31
  • 121
  • 167
  • I've read to work this out by looking for the equivalent in C, but literally everything uses the `stdio.h` package instead of simply `open`, `read` and `write`. I've even looked at `stdio.h`, but the function bodies are not defined anywhere. – jameshfisher Jul 27 '10 at 20:45
  • 3
    open, read, and write are system calls. The bodies are in the Linux kernel. – Borealid Jul 27 '10 at 20:47
  • I meant that I was looking for the bodies of `fopen`, `fread`, and `fwrite`, to see how `open`, `read`, and `write` are used. – jameshfisher Jul 27 '10 at 20:54
  • 1
    the bodies of wrapper functions are in the source of glibc. – Borealid Jul 27 '10 at 21:05
  • re: learning resources: https://stackoverflow.com/tags/x86/info has a bunch of links. Good quality tutorials are rare, but actual *documentation* is usually very good, especially Intel and AMD's manuals, and the NASM documentation (https://nasm.us/). For OS interfaces / calling conventions, Linux has a `syscall(2)` man page, and see the x86-64 System V ABI doc as a PDF. – Peter Cordes Oct 06 '20 at 01:11

3 Answers3

12

Ohh, this is going to be fun.

Assembly language doesn't have variables. Those are a higher-level language construct. In assembly language, if you want variables, you make them yourself. Uphill. Both ways. In the snow.

If you want a buffer, you're going to have to either use some region of your stack as the buffer (after calling the appropriate stack-frame-setup instructions), or use some region on the heap. If your heap is too small, you'll have to make a SYSCALL instruction (another INT 80h) to beg the operating system for more (via sbrk).

Another alternative is to learn about the ELF format and create a global variable in the appropriate section (I think it's .data).

The end result of any of these methods is a memory location you can use. But your only real "variables" like you're used to from the now-wonderful-seeming world of C are your registers. And there aren't very many of them.

The assembler might help you out with useful macros. Read the assembler documentation; I don't remember them off the top of my head.

Life is tough down there at the ASM level.

Borealid
  • 95,191
  • 9
  • 106
  • 122
  • I understand that `buf` is just a pointer to memory that I need to create, and that I need to request `bufsize` memory from the operating system. However, I don't how to do either of those things, and I can't find out. – jameshfisher Jul 27 '10 at 20:54
  • eegg : `malloc` is not actually a system call. You need to set up a heap. This is done, as I mentioned, with the `brk` and `sbrk` system calls. See `man 2 brk`. You need to look up the system call number corresponding to `brk` (see `/usr/include/sys/syscall.h`), then do a `mov eax, #` and `int 80h` to call it as per your above syscall pattern. Once you've done this, you have a heap ending at the address you specify! Neato! – Borealid Jul 27 '10 at 21:00
  • Okay, apparently `brk` is a system call, but `sbrk` isn't. `sbrk` looks decidedly more useful. Using just `brk` would be possible, if I knew where in memory the current program break is. Despite searching, I don't know how to find that. Looking for the source for `sbrk` (which presumably wraps `brk`), I can only find [this](http://www.netmite.com/android/mydroid/cupcake/bionic/libc/unistd/sbrk.c), which I can make neither head nor tail of. – jameshfisher Jul 27 '10 at 21:31
  • @eegg What `sbrk` does is call `brk(0)` and record its return value (which is the current program break). Does that help? – Borealid Jul 27 '10 at 22:27
  • Why mess around with `brk` when you could just `mmap(MAP_ANONYMOUS)` to get however many pages you want? Or honestly for a toy program in need of a large buffer, a static buffer in the BSS is definitely the easiest choice. `section .bss` / `mybuf: resb 1024*1024*1024` reserves 1GiB. (And no you don't want it in `.data`, that would actually put zeros in the executable file.) I'd love to +1 this answer for the good part explaining labels vs. high-level-language variables, though. But static variables do sort of map to label + storage; it's automatic storage that maps to regs. – Peter Cordes Aug 13 '18 at 11:42
5

you must declare your buffer in bss section and the bufsize in data

section .data
   bufsize dw      1024

section .bss
   buf     resb    1024
guyllo
  • 66
  • 1
  • 2
  • What if you don't know the size of the file, how do you do that? – Lance Dec 27 '14 at 06:59
  • The size is an assemble-time constant, you don't need to store it in memory. (And in a `word` makes little sense in 32-bit code). Use `buf: resb 1024` / `bufsize equ $-buf` to have the assembler calculate a size for you that you can use as an immediate instead of having to load from memory. – Peter Cordes Aug 13 '18 at 11:46
  • 1
    If you declared a buffer of 1024 bytes, shouldn't the buffer size be given by `db 1024`, rather than `dw 1024`? – Nikola Petrovic Nov 06 '21 at 10:26
2

After the call to open, the file handle is in eax. You rightfully move eax it to ebx, where the call to read will look for it. Unfortunately, at this point you have already overwritten it with 3, the syscall for reading.