2

I'm trying to link a single-module assembly language program assembled with yasm and I get the following error from ld:

Undefined symbols for architecture x86_64:
  "start", referenced from:
     implicit entry/start for main executable
     (maybe you meant: _start)
ld: symbol(s) not found for inferred architecture x86_64

I actually get this error on a semi-regular basis, so I imagine it's a fairly common problem, but somehow no one seems to have a satisfactory answer. Before anyone says this is a duplicate of a previous question, yeah, I know. Just as you can look at the huge text-wall of similarly-titled questions and see that this is a duplicate, so can I.

Compiler Error: Undefined symbols for architecture x86_64

Not applicable to my problem. I'm not coding in C++, and the solution given in that question is idiosyncratic to that language.

undefined symbol for architecture x86_64 in compiling C program

Also doesn't fix my problem, as I'm not trying to link multiple object files together.

Error Undefined symbols for architecture x86_64:

Solution has to do with a specific framework in a high-level language.

Compiler Error: Undefined symbols for architecture x86_64

Solution involves fixing a function prototype. Not applicable here for obvious reasons.

... You get the idea. Every past question I can find is solved by some idiosyncratic method that isn't applicable to my situation.

Please help me with this. I am so tired of getting this error time and time again and not being able to do anything about it because it's so poorly documented. IMHO the world desperately needs a GNU Dev Tools equivalent of the MS-DOS error code reference manual.

Additional information:

Operating system: Mac OS X El Capitain

Source listing:

segment .text
global _start

_start:
    mov     eax,1   ; 1 is the syscall number for exit
    mov     ebx,5   ; 5 is the value to return
    int     0x80    ; execute a system call

Hexdump of the object file, showing that the symbol is indeed _start and not start:

00000000  cf fa ed fe 07 00 00 01  03 00 00 00 01 00 00 00  |................|
00000010  02 00 00 00 b0 00 00 00  00 00 00 00 00 00 00 00  |................|
00000020  19 00 00 00 98 00 00 00  00 00 00 00 00 00 00 00  |................|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000040  0c 00 00 00 00 00 00 00  d0 00 00 00 00 00 00 00  |................|
00000050  0c 00 00 00 00 00 00 00  07 00 00 00 07 00 00 00  |................|
00000060  01 00 00 00 00 00 00 00  5f 5f 74 65 78 74 00 00  |........__text..|
00000070  00 00 00 00 00 00 00 00  5f 5f 54 45 58 54 00 00  |........__TEXT..|
00000080  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
00000090  0c 00 00 00 00 00 00 00  d0 00 00 00 00 00 00 00  |................|
000000a0  00 00 00 00 00 00 00 00  00 00 00 80 00 00 00 00  |................|
000000b0  00 00 00 00 00 00 00 00  02 00 00 00 18 00 00 00  |................|
000000c0  dc 00 00 00 01 00 00 00  ec 00 00 00 08 00 00 00  |................|
000000d0  b8 01 00 00 00 bb 05 00  00 00 cd 80 01 00 00 00  |................|
000000e0  0f 01 00 00 00 00 00 00  00 00 00 00 00 5f 73 74  |............._st|
000000f0  61 72 74 00                                       |art.|
000000f4
Community
  • 1
  • 1
user628544
  • 227
  • 3
  • 9
  • You don't show how you assemble and link your program on OS/X but I can tell you that your code `int 0x80` appears to be 32-bit Linux ABI calling convention, and even if this assembled and linked it wouldn't run as expected. 32-bit OS/X passes arguments to `int 0x80` on the stack rather than in registers (although _EAX_ will contain the system call) – Michael Petch Nov 26 '16 at 03:06
  • Okay, so it _is_ different on different OSs. I'm using a book on Intel assembly programming for Linux, but I figured the code would work for MacOS as well, since they both use the x86-64 architecture and are POSIX compliant. – user628544 Nov 26 '16 at 03:10
  • 1
    The underlying system calling convention is different between most OSes. The convention can also differ between i386 and x86-64 on the same OS. POSIX doesn't define the underlying method used to make calls into the kernel, that is an implementation detail. – Michael Petch Nov 26 '16 at 03:12
  • It's also worth pointing out that `ld` on my system shows up as being hashed, so it may be a dummy version of `ld` that is not entirely compatible with the real version. – user628544 Nov 26 '16 at 03:13
  • 1
    And I will look at that link. Thank you. – user628544 Nov 26 '16 at 03:14
  • Damnit, Ross, you got rid of my witty XKCD reference. Why do you mods have to ruin everything? – user628544 Nov 26 '16 at 14:52
  • Because it's useless fluff that only obfuscates your question. Ironically, it made it less likely that someone with the same problem as you would find this question and the excellent answer by Michael Petch below. – Ross Ridge Nov 26 '16 at 19:48
  • 1
    The whole point of this website is to eliminate that XKCD phenomenon. It is a major raison d'être for the creation of Stack Overflow. So using the reference doesn't actually accomplish anything. Plus, as Ross mentions, we don't need a big back-story for every question, because we're more like an encyclopedia than a traditional forum. People come here to get answers, not to read about the users behind those answers. Also, Ross is not a moderator. He's just a regular community member (albeit one who has been around a while and provided a lot of useful answers) helping to keep the site clean. – Cody Gray - on strike Nov 27 '16 at 10:13

1 Answers1

8

32-bit OS/X Code Making System Calls via int 0x80

The code:

segment .text
global _start

_start:
    mov     eax,1   ; 1 is the syscall number for exit
    mov     ebx,5   ; 5 is the value to return
    int     0x80    ; execute a system call

Suggests you are using a 32-bit Linux tutorial. I make this conclusion since the 32-bit Linux ABI uses registers to pass arguments to the kernel via int 0x80. OS/X is different. You pass the arguments on the stack (passing them right to left). In 32-bit OS/X it would look like:

global start

section .text
start:
    ; sys_write syscall
    ; See: https://opensource.apple.com/source/xnu/xnu-1504.3.12/bsd/kern/syscalls.master
    ; 4 AUE_NULL ALL { user_ssize_t write(int fd, user_addr_t cbuf, user_size_t nbyte); }
    push    dword msg.len  ; Last argument is length
    push    dword msg      ; 2nd last is pointer to string
    push    dword 1        ; 1st argument is File descriptor (1=STDOUT)
    mov     eax, 4         ; eax = 4 is write system call
    sub     esp, 4         ; On OS/X 32-bit code always need to allocate 4 bytes on stack
    int     0x80

    ; sys_exit
    ; 1 AUE_EXIT ALL { void exit(int rval); }
    push    dword 42       ; Return value
    mov     eax, 1         ; eax=1 is exit system call
    sub     esp, 4         ; allocate 4 bytes on stack
    int     0x80

section .rodata

msg:    db      "Hello, world!", 10
.len:   equ     $ - msg

Assemble and link with:

nasm -f macho testexit.asm
ld -macosx_version_min 10.7.0 -o testexit testexit.o
./testexit
echo $?

YASM parameters should be the same as NASM. It should output:

Hello, world!
42

Rules of thumb for system calls in 32-bit OS/X code:

  • Parameters are passed right to left on the stack
  • int 0x80 does not need to have a 16-bytes aligned stack
  • An additional 4 bytes need to be allocated on stack after the parameters are pushed and before the system call. Examples:

    1. sub esp, 4
    2. push eax
  • System call number in the EAX register

  • System call initiated via int 0x80

The OS/X system calls are documented by Apple on their website.


64-bit OS/X Code Making System Calls via SYSCALL instruction

64-bit OS/X pretty much uses the same kernel calling convention as 64-bit Linux. The 64-bit Linux System V ABI applies for the System Calls. In particular the section A.2 AMD64 Linux Kernel Conventions. That section has these rules:

  1. User-level applications use as integer registers for passing the sequence %rdi, %rsi, %rdx, %rcx, %r8 and %r9. The kernel interface uses %rdi, %rsi, %rdx, %r10, %r8 and %r9.
  2. A system-call is done via the syscall instruction. The kernel destroys registers %rcx and %r11.
  3. The number of the syscall has to be passed in register %rax.
  4. System-calls are limited to six arguments, no argument is passed directly on the stack.
  5. Returning from the syscall, register %rax contains the result of the system-call. A value in the range between -4095 and -1 indicates an error, it is -errno.
  6. Only values of class INTEGER or class MEMORY are passed to the kernel.

64-bit OS/X uses the same System Call numbers as 32-bit OS/X, however all the numbers have to have 0x02000000 added to them. The code above can be modified to work as a 64-bit OS/X program:

global start
section .text

start:
    mov     eax, 0x2000004 ; write system call
    mov     edi, 1         ; stdout = 1
    mov     rsi, msg       ; address of the message to print
    ;lea     rsi, [rel msg]; Alternative way using RIP relative addressing
    mov     edx, msg.len   ; length of message
    syscall                ; Use syscall, NOT int 0x80

    mov     eax, 0x2000001 ; exit system call
    mov     edi, 42        ; return 42 when exiting
    syscall

section .rodata

msg:    db      "Hello, world!", 10
.len:   equ     $ - msg

Please note that when writing to a 32-bit register, the CPU automatically zero extends to the 64-bit register. The code above uses this feature by writing to registers like EAX, EDI instead of RAX and RDI. You could have used the 64-bit registers but using the 32-bit registers saves a byte in the code.

Assemble and link with:

nasm -f macho64 testexit64.asm
ld -macosx_version_min 10.7.0 -lSystem -o testexit64 testexit64.o
./testexit64 
echo $?

It should output:

Hello, world!
42

Note: Some of this information is similar in nature to this OS/X tutorial with some corrections and coding bugs fixed.

Michael Petch
  • 46,082
  • 8
  • 107
  • 198
  • 1
    does work great! thanks ma', but I get this warning: ld: warning: building for macOS 10.7.0 is deprecated ld: warning: PIE disabled. Absolute addressing (perhaps -mdynamic-no-pic) not allowed in code signed PIE, but used in start from hello.o. To fix this warning, don't compile with -mdynamic-no-pic or link with -Wl,-no_pie – Apurva Singh Nov 13 '22 at 19:18