1

I have written the following basic program to add in two numbers, 1+2, as follows:

.globl main

main:

    # put 1 (1 byte int/char) into accumulator register
    mov     $1,     %eax

    # add 2 (1 byte int/char), storing result in accumulator
    add     $2,     %eax

    # move the result of the accumulator into Data register (input/output)
    mov     %eax,   %edx

    ret

When compiled, this does return the expected output:

$ gcc d.s -o d2.out && ./d2.out; echo $?
3

I have a few questions about this program:

  • Is this more-or-less an OK program, or am I misusing any of the operations, etc.?
  • Does an assembly file always have to have one globl function such as main, or can it ever, for example, remove the main / .globl main parts and just "run the code line-by-line"?
  • Finally, what is the best resource for looking up the ops codes? I tend to Google these and it returns differring results: it would be nice to have a standard resource like Python docs, where I can just bookmark one page and look everything up there.
Waqar
  • 8,558
  • 4
  • 35
  • 43
David542
  • 104,438
  • 178
  • 489
  • 842
  • 1.) Well done, but `mov %eax,%edx` is, at least in this case unneccessary. 2.) You could compile it to a flat binary (not an object file), load it in C code, make it executable and run it from there. Or you could use a debugger to run the code line by line. 3.) I didn't found a resource – JCWasmx86 Aug 01 '20 at 19:51
  • 1
    The return value goes into eax, not edx. It doesn't make sense to set up edx with any value here. – fuz Aug 01 '20 at 19:51
  • @JCWasmx86 -- I see. I must've mixed up eax and edx. `%eax` is the register that is read from by the OS with the return code of the program after it returns control? – David542 Aug 01 '20 at 20:04
  • 1
    EAX is returned by `main` to the C runtime. The C runtime passes the value to the OS `_exit` function, either in EDI or on the stack, depending on your system. The shell calls `wait` to retrieve the value so it can print it. (Probably more than you wanted to know.) – prl Aug 01 '20 at 20:13
  • @prl: nitpicks - 32-bit Linux's system call ABI passes args in EBX, ECX, EDX, ... in that order. The C runtime actually uses an `exit_group(2)` syscall (EAX=231/syscall on x86-64), not `_exit` (EAX=60/syscall). (The libc wrapper function called `_exit(2)` [actually uses `exit_group`](https://stackoverflow.com/questions/46903180/syscall-implementation-of-exit/46903734)). And this is only after doing cleanup like flushing stdio buffers, same as the `exit(3)` library function. (glibc's `_exit` amusingly falls back to `exit` on error (including ENOSYS) from `exit_group`) – Peter Cordes Aug 01 '20 at 21:05

2 Answers2

3

mov to EDX is pointless, the return-value register is AL / AX / EAX / RAX / RDX:RAX for widths from 1 byte up to 16 bytes on x86-64. EDX or RDX is only involved for wide return values, too wide to fit in RAX. (Or in 32-bit mode, 64-bit values are returned in the EDX:EAX register pair because there is no RAX.)

This is true for all standard x86 32-bit and x86-64 calling conventions, including the i386 and x86-64 System V ABIs used on GNU/Linux.


If you're writing a main, or any function that you want to call from another file, it needs to be a .globl symbol. (Unless you .include "foo.s" instead of building separately + linking.) That's what makes it visible in the symbol table for the linker to resolve references to it. e.g. from the a call main in the already-compiled code for _start, in crt0.o or something, which you can see gcc linking if you run gcc -v foo.S. (That was an over-simplification; glibc's _start actually passes main's address as an arg to __libc_start_main, which is in libc.so.6, so there is some code from libc proper that runs before main. See Linux x86 Program Start Up or - How the heck do we get to main()?)

If you're making a static executable without CRT (defining _start instead of main and making your own exit_group system call), you can just throw instructions in a file and let the linker (ld) choose the top of the .text section as the ELF entry point if it doesn't find a _start symbol. (Use readelf -a a.out to see info like that.)

If you only plan to run the program under GDB to single-step a couple instructions you're curious about, you can even leave out the exit-cleanly part. (For this, use GDB's starti command to run with a temp breakpoint before the first user-space instruction, so you don't have to set a breakpoint manually by absolute address (because there's no symbol).)

$ cat > foo.S
mov $1 + 2, %edi     # do the math at assemble time
mov $231, %eax         # _NR_exit_group
syscall

$ gcc -static -no-pie -nostdlib foo.S      # like as + ld manually
/usr/bin/ld: warning: cannot find entry symbol _start; defaulting to 0000000000401000

$ ./a.out ; echo $?
3

$ strace ./a.out
execve("./a.out", ["./a.out"], 0x7ffe0706a3c0 /* 54 vars */) = 0
exit_group(3)                           = ?
+++ exited with 3 +++

If your system is 32-bit so as defaults to 32-bit mode, use 32-bit int $0x80 with different registers.

Finally, what is the best resource for looking up the ops codes?

I usually leave a browser tab open to https://www.felixcloutier.com/x86/, which is an HTML scrape of Intel's vol.2 manual. The original PDF has some intro chapters on how to read the entries, so check it out if you find any of the notation confusing. There are older scrapes of Intel's manuals that leave out SIMD instructions, so that's useless for me but maybe what you want as a beginner.

Other resources are linked from the x86 tag wiki, including http://ref.x86asm.net/coder64.html which is organized by opcode, not by mnemonic, and has quick-reference columns to remind you whether an instruction reads or modifies FLAGS, and if so which, and stuff like that.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • thanks. What does `no-pie` do? When I run with that I get `gcc: error: unrecognized command line option ‘-no-pie’`. Though removing that allows me to link it and it returns the proper result. – David542 Aug 01 '20 at 21:03
  • @David542: Apparently you have a pretty old GCC. See [32-bit absolute addresses no longer allowed in x86-64 Linux?](https://stackoverflow.com/q/43367427) and [What's the difference between "statically linked" and "not a dynamic executable" from Linux ldd?](https://stackoverflow.com/q/61553723) - it existed for a while before distros started enabling it by default. It makes a traditional position-dependent executable, so addresses you see in `objdump -d`, and GDB disassembly before starting the program, will match addresses you see *after* starting the program, fixed at link time, not runtime – Peter Cordes Aug 01 '20 at 21:11
  • @David542: Even on a modern system, `-no-pie` is redundant with `-static`, when you're using `-nostdlib` (don't link CRT's `_start`, and don't link any libraries). – Peter Cordes Aug 01 '20 at 21:15
  • "[...] for the linker to resolve references to it. That's" Missing part of a sentence here. – ecm Aug 02 '20 at 06:53
  • @ecm: Thanks, got distracted by another idea and forgot to come back and finish that one. ADHD :P – Peter Cordes Aug 02 '20 at 12:44
2

Is this more-or-less an OK program, or am I misusing any of the operations, etc.?

For a start, yes.

However, assembly is all about efficiency so the last statement is unnecessary:

mov     %eax,   %edx

Does an assembly file always have to have one globl function such as main

Not necessarily. It can be some other function that you can call from your C/C++ code for example. But if you want to make an executable out of it, you will need main or _start if you are using ld as your linker.

"run the code line-by-line"?

You need a debugger for this. And this will be the most important thing if you want to learn assembly. You will want to look at the registers, see how the values are changing, what is happening to the flags etc. I gave an answer which explains a little bit on how to set up a debugger and step through your code. You will need -g flag when assembling with gcc to debug your code.

A basic example:

  1. Compile with -g
gcc -g file.s -o file
  1. start gdb in tui mode.
> gdb --tui ./file
> start           # this will automatically start the program and break at main:
> layout regs     # show registers at the top (you will need this a lot)
> n               # next instruction
> si              # step into, when you use functions, si into function

Pressing enter in gdb will automatically execute the last command again. This will save your from typing n over and over again. Some more commands:

> b 2      # break at line 2
> b func   # break at label func
> b main   # break at main

> print/x  $eax  # print value in eax in hex form, there are other /format specifiers, print/d (decimal), print/s string, print/t (binary)
> x/s $eax    # print string pointed to by eax

> info frame   # look at the current stack frame

These are the most common instructions that you will need. You can type help command_name to get more info about commands. And there are various cheat sheats etc to help you with this.

You can get a gui as well if you want, personally I don't like them much. Checkout Nemiver, which is pretty good. gdbgui can be setup using pip but it's not really good for debugging asm as watching the registers is a pain. There is ddd which I like most, but it's gui is from the 1970's so ...

Finally, what is the best resource for looking up the ops codes?

The best resource are Intel Manuals, however they might be a bit too difficult to read if you are just starting out. I would recommend Felix Cloutier's x86 asm reference. There's a lot of information and reference things in the x86 tag wiki.

You may also want to read Calling Conventions for Linux and lookup Linux Syscalls which you will be needing quite a lot. If you are going to program or just want to learn more about computers, I would highly recommend reading the Programming from the Ground Up book, which is freely available and uses the AT&T style assembly. It is however a bit dated, so you will have to google things. It has an appendix with common x86 instructions which will be very helpful.

Waqar
  • 8,558
  • 4
  • 35
  • 43
  • thanks, so a couple follow-ups here. Is `%eax` the register that the OS reads from at the end? Could you show the most basic example of using a debugger for running a line or two of the asm? – David542 Aug 01 '20 at 20:06
  • And yes, the value in `%eax` is returned to the OS. – Waqar Aug 01 '20 at 20:45
  • 1
    GDB `n` is next source-line. `ni` is next *instruction*. This matters for languages other than asm, or maybe if you have macros, or if debug info isn't perfect. Also, in modern GDB, you can use `start` to run the program with a temporary breakpoint at the top of `main`. Or `starti` to stop before the first user-space instruction, at the entry point. – Peter Cordes Aug 01 '20 at 20:55
  • @Waqar thanks for the suggestion. I got the "Programming from the Ground Up" book. Does that cover assembling, linking, and loading? Or just writing assembly? – David542 Aug 01 '20 at 22:19
  • 1
    It covers a lot of stuff, assembling, linking, cpu architecture and other things. – Waqar Aug 01 '20 at 22:21