5

As part as porting a Forth compiler, I'm trying to create a binary that allows for self-modifying code. The gory details are at https://github.com/klapauciusisgreat/jonesforth-MacOS-x64

Ideally, I create a bunch of pages for user definitions and call mprotect like so:

#define __NR_exit 0x2000001
#define __NR_open 0x2000005
#define __NR_close 0x2000006
#define __NR_read 0x2000003
#define __NR_write 0x2000004
#define __NR_mprotect 0x200004a

#define PROT_READ 0x01
#define PROT_WRITE 0x02
#define PROT_EXEC 0x04
#define PROT_ALL (PROT_READ | PROT_WRITE | PROT_EXEC)
#define PAGE_SIZE 4096


// https://opensource.apple.com/source/xnu/xnu-201/bsd/sys/errno.h
#define EACCES    13              /* Permission denied */
#define EINVAL    22              /* Invalid argument */
#define ENOTSUP   45              /* Operation not supported */


/* Assembler entry point. */
        .text
        .globl start
start:
        // Use mprotect to allow read/write/execute of the .bss section
        mov $__NR_mprotect, %rax                // mprotect
        lea user_defs_start(%rip), %rdi         // Start address
        and $-PAGE_SIZE,%rdi                    // Align at page boundary
        mov $USER_DEFS_SIZE, %rsi               // Length
        mov $PROT_ALL,%rdx
        syscall
        cmp $EINVAL, %rax
        je 1f
        cmp $EACCES,%rax
        je 2f
        test %rax,%rax
        je 4f                                   // All good, proceed:

        // must be ENOTSUP
        mov $2,%rdi                     // First parameter: stderr
        lea errENOTSUP(%rip),%rsi       // Second parameter: error message
        mov $8,%rdx                     // Third parameter: length of string
        mov $__NR_write,%rax            // Write syscall
        syscall
        jmp 3f

1:
        mov $2,%rdi                     // First parameter: stderr
        lea errEINVAL(%rip),%rsi        // Second parameter: error message
        mov $7,%rdx                     // Third parameter: length of string
        mov $__NR_write,%rax            // Write syscall
        syscall
        jmp 3f

2:
        mov $2,%rdi                     // First parameter: stderr
        lea errEACCES(%rip),%rsi        // Second parameter: error message
        mov $7,%rdx                     // Third parameter: length of string
        mov $__NR_write,%rax            // Write syscall
        syscall

3:
        // did't work -- then exit
        xor %rdi,%rdi
        mov $__NR_exit,%rax     // syscall: exit
        syscall

4:
// All good, let's get started for real:

.
.
.

        .set RETURN_STACK_SIZE,8192
        .set BUFFER_SIZE,4096
        .set USER_DEFS_SIZE,65536*2 // 128 kiB ought to be enough for everybody

        .bss
        .balign 8
user_defs_start:
        .space USER_DEFS_SIZE

However, I get an EACCES return value. I suspect that this is because of some security policy apple has set up, but I do not find good documentation.

Where is the source code for mprotect, and/or what are the methods to mark the data area executable and writable at the same time?

I found that compiling with

gcc -segprot __DATA rwx rwx

does mark the entire data segment rwx, so it must somehow be possible to do the right thing. But I would prefer to only make the area hosting Forth words executable, not the entire data segment.

I found a similar discussion here, but without any solution.

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
  • The usual way is to mark it writable but not executable while you perform your changes then mark it read-only executable when you are done. – Jester May 27 '20 at 20:52
  • In forth, that seems impractical to do, because the forth assembler can execute words in the same segment it's compiling a new word in. But it's a good idea to play with this, thanks! – Klapaucius Klapaucius May 27 '20 at 20:57
  • I tried to call mprotect just with PROT_READ|PROT_WRITE and get the same result. Maybe my syscall convention is wrong ? – Klapaucius Klapaucius May 27 '20 at 21:13
  • OK, I am an idiot. 1. I need to ```and $-PAGE_SIZE,%rdi``` instead of ```and $~PAGE_SIZE,%rdi ``` 2. I trace with gdb and what was actually returned in %rax was 14 (EFAULT). I had assumed that per manpage, only EACCES, EINVAL or ENOTSUP would be returned. That helps a bit, but not enough. I probably have to look through the code for mprotect to understand what is going on. – Klapaucius Klapaucius May 27 '20 at 21:13
  • I am an Idiot ^2: the syscall number is ```#define __NR_mprotect 0x200004a``` since the decimal value of mprotect is 74. *Now*, I get the expected EACCES error when I call mprotect with PROT_EXEC, and no error when using PROT_WRITE. I'll edit the question to reflect the (what I believe is) the rigt way to call mprotect. I'd be super happy for pointers to solutions. – Klapaucius Klapaucius May 27 '20 at 22:25
  • Does MacOS's linker support anything like GNU `ld`'s `-N` / `--omagic` option to link the `.text` section as read+write+exec? https://www.man7.org/linux/man-pages/man1/ld.1.html. IDK if the MachO64 executable format can do that or not. – Peter Cordes May 27 '20 at 23:39
  • I looked a bit at the darwin source, and here seems to be some explanation: https://github.com/apple/darwin-xnu/blob/a449c6a3b8014d9406c2ddbdc81795da24aa7443/osfmk/vm/vm_map.c#L5608 Maybe I need to code sign my generated binary for this to work ? I really wish there were better documentation out there :) – Klapaucius Klapaucius May 27 '20 at 23:39
  • @PeterCordes: probably, I imagine that is how gcc -segprot works behind the curtains. I have figured it out after all, will answer my question. – Klapaucius Klapaucius May 28 '20 at 00:50
  • See https://stackoverflow.com/a/61924409/976724 for an explanation. – Elliott Darfink May 28 '20 at 19:25
  • 1
    Is there a reason you need the writable/executable area to be statically allocated? Why not `mmap()` some memory with the desired permissions? – Ken Thomases May 29 '20 at 17:55
  • No compelling reason, I just want to keep the code as small as possible. I also suspect (but have not checked) that for mmaping a segment with rwx permissions, I run into another protection scheme on Macos that requires me to codesign with a special entitlement, and I want to have the jonesforth port to be as close to the original as possible. – Klapaucius Klapaucius May 29 '20 at 18:04
  • To follow up, mmap worked just fine on Catalina, I did not run into any restrictions on rwx segments that required any extra work like code signing. – Klapaucius Klapaucius Jun 05 '20 at 19:07

1 Answers1

5

The segment that I want to 'unprotect' exec permission in has really two values describing its permissions:

  1. the initial protection settings, which for __DATA I want rw-

  2. the maximum protection (loosest) settings, which I want to be rwx.

So first I need to set the maxprot field to rwx. According to the ld manpage, this should be achieved by invoking gcc or ld with the flags -segprot __DATA rwx rw. However, a recent change made by Apple to the linker essentially ignores the maxprot value, and sets maxprot=initprot.

Thanks to Darfink, you can use this script to tweak the maxprot bits after the fact. I thought additional code signing with special entitlements was required, but it's not, at least for the __DATA segment. The __TEXT segment may need code signing with the com.apple.security.cs.disable-executable-page-protection entitlement.

Also see here for a few more details.

Looking at the larger picture, I should also point out that rather to unprotect pieces of an otherwise protected __DATA segment, it may be better to create a complete new data/code segment just for the self-modifying code with rwx permissions from the start. This allows still protecting the rest of data by the operating system, and requires no non-standard tools.

ruvim
  • 7,151
  • 2
  • 27
  • 36
  • Since this is an old answer, I should point out that in Catalina, I ended up just mmaping a segment with RWX protection bits. That did work, however it may not work going forward. For instance on Apple Silicon, RWX pages are simply not allowed anymore, regardless of whether they are created via ld in binaries, with mprotect or via mmap. I'm not sure if it's a HW limitation or an OS policy - I suspect it's the latter and then there would be less reason to keep intel-macos different - plus, in a few years, there will be few interl based macos devices left. – Klapaucius Klapaucius Oct 24 '22 at 18:32