1

I am struggling quite a lot with assembly for mac os (x86_64 architecture). I would like to walk you through the explanation of a hello world program and I would appreciate if you could give me your feedback with suggestions and explanations: having said that let’s jump into the code.

Hello world program

Never felt the pain of an Hello world before. So, this is the code that I have copied and pasted from the internet:
global _main 
section .text

_main:
   mov rax, 0x2000004
   mov rdi, 1
   mov rsi, str
   mov rdi str.len
   syscall

   mov rax, 0x2000001
   xor rdi, rdi 
   syscall

section .data
str: “Hello world”, 
.len: equ $ - str

So let me embarass myself:

  1. global _main is telling basically the linker where to start if I am not mistaken

  2. Section .text is telling the OS (I guess) that this is the beginning of the actual program.

  3. _main if I am not wrong is a function and this seems to be the notation for functions

  4. mov rax, 0x2000004 : I do not understand what this thing does. I looked up on the internet how a syscall works and it basically needs a file code (I think this is the 1 on the next line), a pointer to a buffer (where is exactly this buffer, i think points to the first byte of my string) and the length in bytes of the piece of text (in this case .len). My question is when I need to write something, how does this hexidecimal business work and what is the actual job of the mov rax instruction.

  5. mov rdi, 1: I am still not getting what is actually happening. We need a 1 to set output to stdout, but what is the actual function of this instruction, where is this 1 going, what is happening behind the scenes.

  6. Then we have this str.len which I do not quite understand, what is this .len notation?I get that this gives the size of the string, but how can we write it like this?

  7. syscall: this function seems like black magic, and I know that the Os is doing some dirty tricks but I am pretty ignorant of OS’s and so I can’t get what is this thing doing.

  8. mov rax, 0x2000001: now we need to exit the program, again why do we need to load into a register this hex number (yes I know this is the command to exit but again, what is actually happening).

  9. xor rdi, rdi: this is probably the only bit that I get, we are setting to 0 the content of the rdi register by xoring the same two values.

  10. syscall: this is black magic

  11. str: “Hello World”: I get this :)

  12. .len: I do not understand this .notation. I think that $ means “address of here” or at least this is something I looked up, and I think it is correct.

desertnaut
  • 57,590
  • 26
  • 140
  • 166
  • 3
    Are you sure this works, as you write 1 and str.len into rdi – JCWasmx86 Nov 05 '21 at 17:40
  • Does that trailing comma in `str: “Hello world”, ` produce a terminating zero? Or does this fail to assemble? Unless maybe you're not even using an actual assembler? – Sep Roland Nov 06 '21 at 00:34
  • You probably meant `mov rdx, str.len`. The 3rd arg for system calls goes in RDX, the first arg is already in RDI. The code inside the kernel that you "call" with `syscall` can only look at the current state of registers when it gets control, so anything you overwrote before that is lost. – Peter Cordes Apr 28 '23 at 10:14

1 Answers1

3
  1. No, that just exports the symbol.
  2. No, that tells the assembler which section to put the following stuff into. .text is a default section for code.
  3. No, that's a label. Function entry points are usually denoted by labels, but not all labels are functions.
  4. On MacOS the value 0x2000004 is the code that specifies you want a write system call. The OS will look in rax to determine what the caller wants. All system services have a code. You can imagine the OS doing something like if (rax == 0x2000004) do_write(rdi, rsi, rdx);
  5. rdi is a register. You know the registers, right? Similarly to point #4 above, the OS once it determined you wanted a write will check rdi for the destination file descriptor.
  6. str.len is just a label syntax. The value is defined at the bottom. This should be loaded into rdx not rdi though.
  7. It transfers control to the OS. Which then look at the contents of the registers and performs the action requested. The OS is just code, albeit privileged.

As for (12), yes, $ is the current location, which is the end of the string. So subtracting the start of the string will give you the length. The leading dot is just a special label which instructs the assembler to prefix it with the nearest previous non-local label, in this case str. So that's equivalent to writing str.len.

fuz
  • 88,405
  • 25
  • 200
  • 352
Jester
  • 56,577
  • 4
  • 81
  • 125
  • Basically you do `write(1, "Hello World", 11); exit(0);`, `syscall` is "just" a fancy call, that goes from the userspace into the kernel and back again – JCWasmx86 Nov 05 '21 at 17:52
  • Thanks very much man! Great explanation for a beginner like me in particular the analogy for understanding the system call :P. This is the best resource I have found so far. Do you know by any chance some at least moderately complete resources for x86_64 architecture? – not_here_to_play Nov 05 '21 at 19:01
  • It won't be the answer for everybody, but the C compiler is the best resource for learning assembly – JCWasmx86 Nov 05 '21 at 19:03
  • @not_here_to_play: Intel and AMD both publish complete manuals for the x86-64 ISA. Of course if you actually mean for writing user-space code under an existing kernel like MacOS, with existing ABIs, that would be a totally separate thing that CPU vendor manuals don't mention at all. – Peter Cordes Nov 06 '21 at 14:24
  • Re: `if (rax == 0x2000004) do_write(rdi, rsi, rdx);` - in MacOS, the high bits are flags that specify which table of system calls, and the low bits are an integer index into a table of call numbers. Presumably an array of function pointers, [like Linux uses](https://stackoverflow.com/a/46087731/224132), although Linux and MacOS differ slightly in their system-call ABI. (Error return indicated by CF=1 on MacOS vs. RAX >= -4095ULL on Linux.) – Peter Cordes Nov 06 '21 at 14:28