Noob ASM Questions

Question

I'm trying to learn a bit of assembly over here, and I need a bit of help from the pros!

test.s:

.data
helloworld:
    .asciz "printf test! %i\n"
.text
.globl main
main:
    push $0x40
    push $helloworld
    call printf
    mov $0, %eax
    ret

test.working.s:

.data
helloworld:
    .asciz "printf test! %i\n"
.text
.globl main
main:
    mov $0x40, %esi
    mov $printf_test, %edi
    mov $0x0, %eax
    call printf
    mov $0, %eax
    ret

compile.sh:

rm test
gcc test.s -o test -lc
chmod 777 test
./test

test.s immediately segfaults. I made test.working.s by using the disassembly window in eclipse and just writing a small C program to print something using printf.

So, questions!

Why does test.s not work
In my C program, main is defined as main(int argc, char ** argv) is it not? Hence shouldn't I need to pop twice at the start if I don't need those arguments?
In x86-64, I read somewhere that %rax was the 64-bit register, %eax was the 32 bit register, and %ax was the 16 bit register. So does the register look like this: XX XX EE EE RR RR RR RR (R = 4bits of RAX, E = 4 bits of EAX, X = 4 bits of AX) on a little-endian system (1 is represented as 0x01000000, I think...)?
GCC wont let me type pop %eax or push %eax. It will only let me type the 64 bit versions or the 16 bit versions. How do I push the 32 EAX bits of RAX to the stack then? How do I pop just 32 bits?
test.working.s (I imagine this is answered in 1, but if not...) calls printf by changing registers, NOT by pushing stuff onto the stack. I presume this is because it is faster? How do you know when to do this and in what order when calling c functions?
Will this also work on Windows x86-64? I understand that the operation of printf may be different, but if I clean up and restore the registers after the printf, I should be OK right?
How are you supposed to clean up and restore the registers? According to http://www.cs.uaf.edu/2005/fall/cs301/support/x86/, it says that I "must save %esp, %ebp, %esi, %edi". Is that referring to the fact that, when I write a function, these registers must come back out the way they came in, or that I should save them myself before I call a function. It's probably the former, since %esp, but just checking!
It's pretty clear that I won't be needing x86-64, especially since I'm just starting, so how do I alter compile.sh for just x86?
Does .asciz simply mean .ascii + "\0"?
I can return large structs (>64bit) that reside on the stack in C. How is this achieved in assembly?

Cheers for any help!

score 3 · Accepted Answer · edited May 23 '17 at 12:03

Because in 64-bit mode you should be passing the arguments in registers rather on the stack (see this answer). Even if that wasn't the case, you don't have a size specifier for push $0x40, so it's quite likely that you're only pushing a 16-bit value rather than 32 bits.
The top of the stack will contain the return address to wherever the call to main came from (e.g. __libc_start_main). Below that you'll find argc and argv. There's no need for you to pop any of those (you shouldn't pop any of them since you need to preserve the return address).
The 32-bit value 1 would be written as 0x00000001 (most significant nybble to the left), and would be stored as (low address) 01 00 00 00 (high address) in a little-endian configuration. Since it's typical to write numbers with the most significant digits first rather than according to how they're stored, it would make sense to write your RAX description as RR RR RR RR EE EE XX XX, possibly with bit index markers if it's unclear what the order is.
Again, that's the calling convention for 64-bit x86 code, as described in this answer.
Not without some changes, since the 64-bit calling convention used by Windows is slightly different (the registers used for passing arguments are RCX, RDX, R8, R9).
By saving them on the stack for example. There are callee saved registers and caller saved registers.
The callee (the function being called) must save certain registers and restore them before returning in order to comply with the calling convention. For a 64-bit program on a Linux-type system that would be RBX, RBP, R12-R15 (on 64-bit Windows this also includes RSI and RDI).
The caller (the code calling a function) must consider certain registers volatile (i.e. can be changed by the function) and should save and restore them if it needs their values after the function returns. On a Linux-type system these would be RAX, RCX, RDX, RSI, RDI, R8-R11.
The GNU assembler should support an -m32 command-line option to specify that you're assembling 32-bit code.
Yes.

Thanks a bunch! Any ideas about 10? Also, if little endian dictates that the least significant bit is in the first byte, are you sure it's not `01 00 00 00`? Wikipedia says big endian means that the least significant bit is on last byte — AStupidNoob, May 14 '13 at 06:25
That depends on how you view the data. If you had the 32-bit little-endian values 1, 2 in a file and looked at it in a hex editor byte-by-byte you'd see `01 00 00 00 02 00 00 00`. But if you changed the view to dword mode you'd see `00000001 00000002`. — Michael, May 14 '13 at 06:34
My suggestion for point 10 would be to compile a small C example with the `-S` option and check the resulting assembly code. — Michael, May 14 '13 at 06:39
I thought the reason you left 2 blank was because these arguments were passed as registers. So even though these libc functions use the register passing calling convention, whatever calls main still uses the old one? — AStupidNoob, May 14 '13 at 07:27

Noob ASM Questions

1 Answers1