1

I am trying to build an x86 program that reads a file into memory. It uses a few different syscalls, and messes with memory and such. There's a lot in there to figure out.

To simplify debugging and figuring this out, I wanted to add assert statements which, if there's a mismatch, it prints out a nice error message. This is the first step in learning assembly so I can print the numbers and strings that get placed on different registers and such after operations. Then I can print them out and debug them without any fancy tools.

Wondering if one could help me write an ASSERT AND PRINT in NASM for Mac x86-64. I have this so far:

%define a rdi
%define b rsi
%define c rdx
%define d r10
%define e r8
%define f r9
%define i rax

%define EXIT 0x2000001
%define EXIT_STATUS 0

%define READ 0x2000003 ; read
%define WRITE 0x2000004 ; write
%define OPEN 0x2000005 ; open(path, oflag)
%define CLOSE 0x2000006 ; CLOSE
%define MMAP 0x2000197 ; mmap(void *addr, size_t len, int prot, int flags, int fildes, off_t offset)

%define PROT_NONE 0x00 ; no permissions
%define PROT_READ 0x01 ; pages can be read
%define PROT_WRITE 0x02 ; pages can be written
%define PROT_EXEC 0x04 ; pages can be executed

%define MAP_SHARED 0x0001 ; share changes
%define MAP_PRIVATE 0x0002 ; changes are private
%define MAP_FIXED 0x0010 ; map addr must be exactly as requested
%define MAP_RENAME 0x0020 ; Sun: rename private pages to file
%define MAP_NORESERVE 0x0040 ; Sun: don't reserve needed swap area
%define MAP_INHERIT 0x0080 ; region is retained after exec
%define MAP_NOEXTEND 0x0100 ; for MAP_FILE, don't change file size
%define MAP_HASSEMAPHORE 0x0200 ; region may contain semaphores

;
; Assert equals.
;

%macro ASSERT 3
  cmp %1, %2
  jne prepare_error
prepare_error:
  push %3
  jmp throw_error
%endmacro

;
; Print to stdout.
;

%macro PRINT 1
  mov c, getLengthOf(%1) ; "rdx" stores the string length
  mov b, %1 ; "rsi" stores the byte string to be used
  mov a, 1 ; "rdi" tells where to write (stdout file descriptor: 1)
  mov i, WRITE ; syscall: write
  syscall
%endmacro

;
; Read file into memory.
;

start:
  ASSERT PROT_READ, 0x01, "Something wrong with PROT_READ"

  mov b, PROT_READ
  mov a, PROT_WRITE
  xor a, b

  mov f, 0
  mov e, -1
  mov d, MAP_PRIVATE
  mov c, a
  mov b, 500000
  mov a, 0
  mov i, MMAP
  syscall
  PRINT "mmap output "
  PRINT i ; check what's returned
  PRINT "\n"
  mov e, i

  mov b, O_RDONLY
  mov a, "Makefile"
  mov i, OPEN
  syscall
  mov a, i

  mov b, e
  mov i, READ
  syscall

;
; Exit status
;

exit:
  mov a, EXIT_STATUS ; exit status
  mov i, EXIT ; syscall: exit
  syscall

throw_error:
  PRINT pop() ; print error or something
  jmp exit
Lance
  • 75,200
  • 93
  • 289
  • 503
  • 1
    https://github.com/cirosantilli/x86-assembly-cheat/blob/26b8bf25299a1e84aeef8836287c24d75c3d84ab/x86-64/lib/common_nasm.inc – Lance Mar 22 '19 at 11:57

1 Answers1

3

mov rsi, "abcdefgh" is a mov-immediate of the string contents, not a pointer to it. It only exists as an immediate if you do that.

Your macro will need to switch to .rodata and back to put the string in memory; possibly you could turn it into a sequence of push-immediate onto the stack with NASM macros, but that sounds hard.

So you can use the usual msglen equ $ - msg to get the length. (Actually using NASM local labels so the macro doesn't create conflicts).


See NASM - Macro local label as parameter to another macro where I wrote basically this answer a couple weeks ago. But not exactly a duplicate because it didn't have the bug of using the string as an immediate.

NASM's mechanism for letting macros switch sections and then return to whatever section they expanded in is to have section foo define a macro __?SECT?__ as [SECTION foo]. See the manual and the above linked Q&A.

    ; write(1, string, sizeof(stringarray))
    ; clobbers: RDI, RSI, RDX,   RCX,R11 (by syscall itself)
    : output: RAX = bytes written, or -errno
%macro PRINT 1
[section .rodata]                ; change section without updating __?SECT?__ macro
;; NASM macro-local labels
    %%str    db  %1          ; put the string in read-only memory
    %%strln  equ $ - %%str   ; current position - string start

__?SECT?__                       ; change back to original sectoin
  mov     edx, %%strlen           ; len
  lea     rsi, [rel %%str]        ; buf = the string.  (RIP-relative for position-independent)
  mov     edi, 1                  ; fd = stdout
  mov     eax, WRITE
  syscall
%endmacro

This doesn't attempt to combine duplicates of the same string. Using it many times with the same message will be inefficient. This doesn't matter for debugging.

I could have left your %defines for RDI, and let NASM optimize mov rdi, 1 (7 bytes) into mov edi, 1 (5 bytes). But YASM won't do that so it's better to make it explicit if you care about anyone building your code with YASM.

I used a RIP-relative LEA because that's the most efficient way to put a static address into a register in position-independent code. In Linux non-PIE executables, use mov esi, %%str (5 bytes and can run on any port, more than LEA). But on OS X, the base virtual address where an executable is mapped/loaded is always above 2^32, and you never want mov r64, imm64 with a 64-bit absolute address.
See How to load address of function or label into register


On Linux, where system-call numbers are small integers, you could use lea eax, [rdi-1 + WRITE] to do eax = SYS_write with a 3 byte instruction vs. 5 for mov.

The standard names for call-number constants are POSIX SYS_foo from sys/syscall.h or Linux __NR_foo from asm/unistd.h. But NASM can't #include C preprocessor #define macros, so you'd need to mechanically convert one of those headers to NASM syntax, e.g. with some script.

Or if manually defining names, just choose %define SYS_write 1

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • 1
    One remark it's possible to manually craft a `MacOS` (including latest `Mojave 10.13.x`) 64bit executable with base virtual address `0x1000` (not lower though). That's not something `ld` would ever do though. – Kamil.S Mar 25 '19 at 11:54
  • @Kamil.S: Neat. Do executable still need to be relocatable for ASLR? Or could you then (in theory) make position-dependent code that uses 32-bit absolute addresses? IIRC, MachO64 object files don't support 32-bit relocations, so the "linker" inputs might need to be ELF64, or flat binaries you made with `org 0x1000` – Peter Cordes Mar 25 '19 at 19:15
  • 1
    Yes, executable does need the `MH_PIE` flag to allow ASLR. Position-dependent code will happily accept 32-bit absolute addresses and it does work. I don't use MachO64 object files in this setup just raw binary format and hand crafted Mach-O header , so I can't really tell on the former. – Kamil.S Mar 25 '19 at 20:38