7

Starting to learn assembly, I was given some Hello World assembly code created during the class on Linux. I would like to get it to work for 64-bit Mac OS X.

code.asm:

SECTION .data       
    hola:   db "Hola!",10   
    tam:    equ $-hola      

SECTION .text       
    global main     

main:               

    mov edx,tam     
    mov ecx,hola        
    mov ebx,1       
    mov eax,4       
    int 0x80        

    mov ebx,0       
    mov eax,1       
    int 0x80        

This is what I do:

nasm -f macho32 -o object.o code.asm
gcc -m32 -o program object.o

Which tells me:

Undefined symbols for architecture i386: "_main", referenced from: start in crt1.10.6.o ld: symbol(s) not found for architecture i386

Searching for this error, I found this question: nasm and gcc: 32 bit linking failed (64 bit Mac OS X)

One answer says

The problem you're having is that you're creating a 32-bit Linux(ELF) object file which isn't compatible with the Mac OS X object format. Try switching '-f elf' to '-f macho32'.

But I'm definitely using -f macho32. So what would the problem be then?

Community
  • 1
  • 1
Saturn
  • 17,888
  • 49
  • 145
  • 271
  • The main assembler on Mac OS X appears to be `as - Mac OS X Mach-O GNU-based assemblers`. I'm not sure of the implications of that. – Jonathan Leffler Sep 09 '13 at 00:08

3 Answers3

8

I've been trying to teach myself some entry-level Assembly programming too, and I ran into similar issues. I had originally compiled using nasm with elf, but that didn't work when I tried to use ld to link the object file and create the executable.

I think the answer you main question "what would the problem be then?" [to get this to run on 64bit MacOSX] is: You are using -f macho32 but expecting it to run on a 64bit machine, you need to change the command option to be -f macho64. Of course, this will not resolve the fact that your assembly code is written for a different architecture (more on that in a bit).

I found this handy answer on the right command to use in this instance to compile and link your code (after you refactor your assembly code to use the proper syntax instead of *nix as duskwuff stated): nasm -f macho64 main.asm -o main.o && ld -e _main -macosx_version_min 10.8 -arch x86_64 main.o -lSystem

After some searching, here's what I learned...

  1. On Mac 64bit, it might be better to use the as assembler instead of nasm (if you want something more native), but if you want more portable code (learn the differences).
  2. nasm doesn't come with the macho64 output type installed by default
  3. Assembly is a pain in the keister (this aside)

Now that my learning rant is out of the way...

Here is the code which should operate on MacOSX 64 using nasm (if you have updated nasm with macho64, credit to Dustin Schultz):

section .data
hello_world     db      "Hello World!", 0x0a

section .text
global start

start:
mov rax, 0x2000004      ; System call write = 4
mov rdi, 1              ; Write to standard out = 1
mov rsi, hello_world    ; The address of hello_world string
mov rdx, 14             ; The size to write
syscall                 ; Invoke the kernel
mov rax, 0x2000001      ; System call number for exit = 1
mov rdi, 0              ; Exit success = 0
syscall                 ; Invoke the kernel

Working code I used with the as assembler native to MacOSX64:

.section __TEXT,__text

.global start

start:
  movl $0x2000004, %eax           # Preparing syscall 4
  movl $1, %edi                   # stdout file descriptor = 1
  movq str@GOTPCREL(%rip), %rsi   # The string to print
  movq $100, %rdx                 # The size of the value to print
  syscall

  movl $0, %ebx
  movl $0x2000001, %eax           # exit 0
  syscall

.section __DATA,__data
str:
  .asciz "Hello World!\n"

Compile command: as -arch x86_64 -o hello_as_64.o hello_as_64.asm

Link Command: ld -o hello_as_64 hello_as_64.o

Execute Command: ./hello_as_64

Some helpful resources I found along my journey:

AS OSX Assembler Reference: https://developer.apple.com/library/mac/documentation/DeveloperTools/Reference/Assembler/Assembler.pdf

Writing 64 Bit Assembly on Mac OSX: http://www.idryman.org/blog/2014/12/02/writing-64-bit-assembly-on-mac-os-x/

Couldn't link object file using ld: Can't link object file using ld - Mac OS X

OSX i386 SysCalls: http://www.opensource.apple.com/source/xnu/xnu-1699.26.8/osfmk/mach/i386/syscall_sw.h

OSX Master System Call Definitions: http://www.opensource.apple.com/source/xnu/xnu-1504.3.12/bsd/kern/syscalls.master

OSX Syscall: https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/syscall.2.html

Community
  • 1
  • 1
Benjamin Dean
  • 1,218
  • 11
  • 11
  • If you don't already know both Intel / NASM and AT&T syntax for x86, I'd recommend learning NASM syntax first. It's less convenient to have to use `gcc -S -masm=intel` and `objdump -M intel` to output in that syntax, but the Intel insn ref manual uses Intel/NASM syntax. With 3 and 4 operand instructions, the extra hurdle of reversing their order and keeping track of which one can be an immediate or memory operand can be a mental burden. Also, [AT&T syntax has a bug with one of the x87 instructions](https://www.acrc.bris.ac.uk/acrc/RedHat/rhel-as-en-4/i386-bugs.html). – Peter Cordes Nov 08 '15 at 03:16
  • For disassembly, I'd recommend Agner Fog's `objconv` anyway. (http://agner.org/optimize/. Build it from source if you don't find an OS X binary.) On OS X, make sure you have the latest version of NASM, because there was a nasty bug with data-section labels recently. There are several SO questions. I actually like YASM. It's used by the x264 and x265 video codecs, which have a ton of asm, and clearly are projects that didn't just pick an assembler at random. – Peter Cordes Nov 08 '15 at 03:20
  • Also, for moving constants into registers, prefer `mov edi, 1` to `mov rdi, 1`, in case your assembler thinks you *want* the REX-prefix version of the instruction to do a sign-extending `mov r64, imm32`, or worse a `movabs r64, imm64`. You're fine just doing the 32bit `mov r32, imm32` and letting the upper32 be zeroed as always when you write to the low32. (That's why zeroing a register with `xor eax,eax` is one byte shorter and otherwise equivalent to `xor rax,rax`). See Agner Fog's guide for more stuff like this. – Peter Cordes Nov 08 '15 at 03:25
  • This answer would really benefit from a link to a system call table for OS X, something like what https://filippo.io/linux-syscall-table/ is for Linux. (The numeric constants in Linux are in `#define`s in C header files in the kernel source.) Actually, I should add that to the x86 tag wiki info page. I'll also link to this answer for OS X stuff. – Peter Cordes Nov 08 '15 at 03:28
  • @PeterCordes I think I have added what you suggested. Please let me know if that improves the answer. I appreciate your input greatly, as I'm just learning Assembly (and even in the first hour of writing it with reference materials was banging my head on this stuff). Your comments have provided some great feedback. – Benjamin Dean Nov 08 '15 at 03:56
  • yup, nice addition. Happy to help out. I didn't see a parameter->register mapping rule in any of those links. If you come across that, that's also necessary for using syscalls directly (instead of writing asm that calls libc like any other program would). – Peter Cordes Nov 08 '15 at 04:35
  • Your AT&T version doesn't need to load the address from the GOT. Use `lea str(%rip), %rsi` like a normal person to reference the label directly, unless you want a shared library to be able to override it (symbol interposition). Just like the `lea rsi, [rel hello_world]` you should have done in NASM instead of `mov r64, imm64`. [How to load address of function or label into register in GNU Assembler](https://stackoverflow.com/q/57212012) – Peter Cordes Sep 27 '20 at 17:01
4

You would need to:

  1. Change the label name from main to _main (in both places). Symbol naming works a little bit differently under Mac OS X.

  2. Change the way you pass arguments to the system call. Mac OS X uses a different calling convention for the kernel from Linux; this code is not portable! I don't know as there's any official documentation for how it does work, but looking at the disassembly in GDB for a standard library function like _exit() may be instructive.

Here's _exit on my system, for instance:

    <_exit+0>:  mov    $0x40001,%eax
    <_exit+5>:  call   0x96f124c2 <_sysenter_trap>
    <_exit+10>: jae    0x96f10086 <_exit+26>
    <_exit+12>: call   0x96f1007d <_exit+17>
    <_exit+17>: pop    %edx
    <_exit+18>: mov    0x15a3bf9f(%edx),%edx
    <_exit+24>: jmp    *%edx
    <_exit+26>: ret
    <_exit+27>: nop

The extra bit set in 0x40001 is... weird, but can be safely ignored here.

The stuff following the call to _sysenter_trap is for error handling.

_sysenter_trap is:

    <_sysenter_trap+0>: pop    %edx
    <_sysenter_trap+1>: mov    %esp,%ecx
    <_sysenter_trap+3>: sysenter
    <_sysenter_trap+5>: nop

All things considered, you're probably better off linking to libSystem (the OS X equivalent of libc) instead of trying to call the kernel directly.

1

I've wrote a blog post on this topic: https://cs-flex.hashnode.dev/linux-assembly-on-macos


You have 3 main options:

  1. VM -- i don't recommend
  2. Renting a Linux server, not a bad option if you don't mind paying ~20-30$ a month
  3. (My personal best option) using Docker to create a Linux container, that shares a folder (volume) and run assembler there. If you didn't use Docker before -- i still think this option is the best one.

You can read details in my blog post (especially if you didn't use Docker before). But in short, all you will need is this two files:

# Dockerfile
FROM ubuntu:latest

RUN apt-get update
RUN apt-get install -y gcc
RUN apt-get install -y make
# docker-compose.yml
version: "3"
services:
    linux:
        image: linux-image
        container_name: linux-container
        build:
            context: .
        command: sleep 1000
        volumes:
            - .:/code

You will be able to run container and connect to it via

docker-compose up  # build and run docker container
docker exec -it linux-container bash  # "ssh" into container

after this you all your code in the folder with docker files will be "linked" to the folder /code/ inside `Docker. Therefore you can execute it inside docker container as if you were running Linux.

AlexFreik
  • 64
  • 1
  • 3