Because alignment.
XNU enforced that every segment that maps part of the binary be aligned to the platform's page size. On x86_64, that is 0x1000 bytes, on arm64 that is 0x4000 bytes (even where the hardware would support 0x1000). And if the data for certain segments must be aligned to a certain offset, then there has to be something in the file that fills the gap in between - usually zeroes.
Now, if your binary is 48KB, then its segments will probably look something like this:
LC 00: LC_SEGMENT_64 Mem: 0x000000000-0x100000000 File: Not Mapped ---/--- __PAGEZERO
LC 01: LC_SEGMENT_64 Mem: 0x100000000-0x100004000 File: 0x0-0x4000 r-x/r-x __TEXT
LC 02: LC_SEGMENT_64 Mem: 0x100004000-0x100008000 File: 0x4000-0x8000 rw-/rw- __DATA_CONST
LC 03: LC_SEGMENT_64 Mem: 0x100008000-0x10000c000 File: 0x8000-0xc000 rw-/rw- __DATA
LC 04: LC_SEGMENT_64 Mem: 0x10000c000-0x100010000 File: 0xc000-0xc110 r--/r-- __LINKEDIT
For an alignment of 0x4000, that is already the minimal layout. But if you're on Intel, you can force the linker to use 0x1000 by passing -Wl,-segalign,0x1000
to the compiler. This should result in a binary that is only about 12KB:
LC 00: LC_SEGMENT_64 Mem: 0x000000000-0x100000000 File: Not Mapped ---/--- __PAGEZERO
LC 01: LC_SEGMENT_64 Mem: 0x100000000-0x100001000 File: 0x0-0x1000 r-x/r-x __TEXT
LC 02: LC_SEGMENT_64 Mem: 0x100001000-0x100002000 File: 0x1000-0x2000 rw-/rw- __DATA_CONST
LC 03: LC_SEGMENT_64 Mem: 0x100002000-0x100003000 File: 0x2000-0x3000 rw-/rw- __DATA
LC 04: LC_SEGMENT_64 Mem: 0x100003000-0x100004000 File: 0x3000-0x3110 r--/r-- __LINKEDIT
If you wanted to further optimise your binary, you'd need to get rid of segments. With imports and linking, the only one you can really get rid of is __DATA_CONST
, and you can do that by targeting macOS Mojave (or older) with -mmacosx-version-min=10.14
. This will leave you with just over 8KB:
LC 00: LC_SEGMENT_64 Mem: 0x000000000-0x100000000 File: Not Mapped ---/--- __PAGEZERO
LC 01: LC_SEGMENT_64 Mem: 0x100000000-0x100001000 File: 0x0-0x1000 r-x/r-x __TEXT
LC 02: LC_SEGMENT_64 Mem: 0x100001000-0x100002000 File: 0x1000-0x2000 rw-/rw- __DATA
LC 03: LC_SEGMENT_64 Mem: 0x100002000-0x100003000 File: 0x2000-0x20f0 r--/r-- __LINKEDIT
If you were striving for the smallest possible executable, you could further ditch __DATA
and possibly even __LINKEDIT
, but you'd have to substantially change your code to only emit raw syscalls, not use the dynamic linker, etc.
For any real-world application, I would also say that these zeroes effectively don't matter. Given four mapped segments, they will never use up more than 48KB. And the bigger the binary, the smaller the percentage that the zeroes make up.
As for distribution, there's the obvious answer: xz
.
Compressing the above binaries with that yields:
- 776 bytes for the 48KB binary.
- 736 bytes for the 12KB binary.
- 684 bytes for the 8KB binary.