1

I've compiled

#include <stdio.h>

int main() {
    printf("Hello world");
    return 0;
}

on a Mac and it's 48k in size. However when I look at the binary with xxd most of it looks like this:

...
0000b990: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000b9a0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
0000b9b0: 0000 0000 0000 0000 0000 0000 0000 0000  ................
...

Why is it so?

otool tells me:

 otool -L hello
hello:
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1292.0.0)

so fine it's linked dynamically again libSystem, where it printf is.

Then why all the zeroes?

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
scrrr
  • 5,135
  • 7
  • 42
  • 52
  • 14
    Most of it is probably 4k page alignment filled in with zeroes – Michael Petch Dec 24 '20 at 23:13
  • 1
    Possibly related: a linux Q&A about minimal executables with lots of padding explains some of the reasons why linkers will choose to do that: [Minimal executable size now 10x larger after linking than 2 years ago, for tiny programs?](https://stackoverflow.com/q/65037919) – Peter Cordes Dec 24 '20 at 23:31
  • Are there compiler options to turn it off? e.g. to get a smallest possible executable? – scrrr Dec 25 '20 at 08:59
  • 2
    @scrrr you will have to hand write the executable to really push the limit (4k in modern MacOS). https://stackoverflow.com/a/32659692/5329717 You won't get totally rid of zeroes though. – Kamil.S Dec 25 '20 at 13:12

1 Answers1

6

Because alignment.

XNU enforced that every segment that maps part of the binary be aligned to the platform's page size. On x86_64, that is 0x1000 bytes, on arm64 that is 0x4000 bytes (even where the hardware would support 0x1000). And if the data for certain segments must be aligned to a certain offset, then there has to be something in the file that fills the gap in between - usually zeroes.

Now, if your binary is 48KB, then its segments will probably look something like this:

LC 00: LC_SEGMENT_64  Mem: 0x000000000-0x100000000  File: Not Mapped    ---/--- __PAGEZERO
LC 01: LC_SEGMENT_64  Mem: 0x100000000-0x100004000  File: 0x0-0x4000    r-x/r-x __TEXT
LC 02: LC_SEGMENT_64  Mem: 0x100004000-0x100008000  File: 0x4000-0x8000 rw-/rw- __DATA_CONST
LC 03: LC_SEGMENT_64  Mem: 0x100008000-0x10000c000  File: 0x8000-0xc000 rw-/rw- __DATA
LC 04: LC_SEGMENT_64  Mem: 0x10000c000-0x100010000  File: 0xc000-0xc110 r--/r-- __LINKEDIT

For an alignment of 0x4000, that is already the minimal layout. But if you're on Intel, you can force the linker to use 0x1000 by passing -Wl,-segalign,0x1000 to the compiler. This should result in a binary that is only about 12KB:

LC 00: LC_SEGMENT_64  Mem: 0x000000000-0x100000000  File: Not Mapped    ---/--- __PAGEZERO
LC 01: LC_SEGMENT_64  Mem: 0x100000000-0x100001000  File: 0x0-0x1000    r-x/r-x __TEXT
LC 02: LC_SEGMENT_64  Mem: 0x100001000-0x100002000  File: 0x1000-0x2000 rw-/rw- __DATA_CONST
LC 03: LC_SEGMENT_64  Mem: 0x100002000-0x100003000  File: 0x2000-0x3000 rw-/rw- __DATA
LC 04: LC_SEGMENT_64  Mem: 0x100003000-0x100004000  File: 0x3000-0x3110 r--/r-- __LINKEDIT

If you wanted to further optimise your binary, you'd need to get rid of segments. With imports and linking, the only one you can really get rid of is __DATA_CONST, and you can do that by targeting macOS Mojave (or older) with -mmacosx-version-min=10.14. This will leave you with just over 8KB:

LC 00: LC_SEGMENT_64  Mem: 0x000000000-0x100000000  File: Not Mapped    ---/--- __PAGEZERO
LC 01: LC_SEGMENT_64  Mem: 0x100000000-0x100001000  File: 0x0-0x1000    r-x/r-x __TEXT
LC 02: LC_SEGMENT_64  Mem: 0x100001000-0x100002000  File: 0x1000-0x2000 rw-/rw- __DATA
LC 03: LC_SEGMENT_64  Mem: 0x100002000-0x100003000  File: 0x2000-0x20f0 r--/r-- __LINKEDIT

If you were striving for the smallest possible executable, you could further ditch __DATA and possibly even __LINKEDIT, but you'd have to substantially change your code to only emit raw syscalls, not use the dynamic linker, etc.

For any real-world application, I would also say that these zeroes effectively don't matter. Given four mapped segments, they will never use up more than 48KB. And the bigger the binary, the smaller the percentage that the zeroes make up.

As for distribution, there's the obvious answer: xz.
Compressing the above binaries with that yields:

  • 776 bytes for the 48KB binary.
  • 736 bytes for the 12KB binary.
  • 684 bytes for the 8KB binary.
Siguza
  • 21,155
  • 6
  • 52
  • 89
  • 1
    Side note: `zstd` should do nearly as good a job as `xz` for this, and compress / decompress faster (much faster for large sizes; IDK about startup overhead). Arch GNU / Linux for example has switched over their binary packages to zstd instead of xz. – Peter Cordes Dec 25 '20 at 19:59
  • 1
    You can also do a physical zero size(in the executable) `__LINKEDIT`, and `bind_at_load` to its virtual memory (without a `dylb_stub_binder`). It's not something the linker will ever emit though. – Kamil.S Dec 26 '20 at 12:47