Why do these `const int main=0xc3` (or other number) programs return 252 on OS X?

Question

I heard about the "shortest C program that results in an illegal instruction": const main=6; for x86-64 over on codegolf.SE and it got me curious what would happen if I put different numbers there.

Now I guess this has to do with what is or isn't a valid x86-64 instruction (durr) but specifically I'd like to know what the different results mean.

const main=0 through 2 give bus error.
const main=3 gives a segfault.
6 and 7 give illegal instruction.

I get various bus errors and segfaults and illegal instructions up until const main=194 which didn't give me an interrupt at all (at least not that got through to my python script that was generating these little programs).

There are a few other numbers that also do not lead to exceptions/interrupts and thus to Unix signals. I checked the return code of a couple and the return code was 252. I don't know why or what that means or how it got there.

204 got me a "trace trap". This is 0xcc which I know is the int3 interrupt - that's fun! (241/0xf1 also gets me this)

Anyway, it keeps going and it's obviously mostly bus errors and segfaults and a few illegal instructions here and there and the occasional... does whatever it does and then returns with 252...

I googled around some opcodes but I don't really know what I am doing or where to look to be honest. I haven't even looked at all my outputs yet just been scrolling through. I understand that a segfault is invalid access to valid memory and a bus error is access to invalid memory and I plan to look at the patterns of the numbers and work out where these are happening and why. But the 252 thing has me a bit stumped.

#!/usr/bin/env python3
import os
import subprocess
import time
import signal

os.mkdir("testc")
try:
    os.chdir("testc")
except:
    print("Could not change directory, exiting.")

for i in range(0, 65536):
    filename = "test" + str(i) + ".c"
    f = open(filename, "w")
    f.write("const main=" + str(i) + ";")
    f.close()
    outname = "test" + str(i)
    subprocess.Popen(["gcc", filename, "-o",  outname], stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    time.sleep(1)
    err = subprocess.Popen("./" + outname, shell=True)
    result = None
    while result is None:
        result = err.poll()
    r = result
    if result == -11:
        r = "segfault"
    if result == -10:
        r = "bus error"
    if result == -4:
        r = "illegal instruction"
    if result == -5:
        print = "trap"
    print("const main=" + str(hex(i)) + " : " + r)

This produces a C program in testc/test20.c like

const int main=20;

Then compiles it with gcc and runs it. (And sleeps for 1 second before trying the next number.)

There were no expectations. I just wanted to see what happened.

You do realise that none of those programs are actually *valid,* yes? — paxdiablo, Jan 24 '19 at 01:33
@paxdiablo Yes, he clearly does. He's just curious about the different ways that the program fails. — Barmar, Jan 24 '19 at 01:37
A positive value is the process's exit status, as in `exit(252)`. I guess what you're seeing has something to do with the way the process returns this to the parent. — Barmar, Jan 24 '19 at 01:41
Note that what your invalid programs do depends on what other data is in the memory after your "main". Try running the programs with a debugger and step through the assembly instructions. — hyde, Jan 24 '19 at 06:10
Or to put it another way, your programs are 100% Undefined Behavior, and trying to draw any conclusions at C level is futile. You really need to look at the assembly code. — hyde, Jan 24 '19 at 06:12
Why do you show us your python-stuff if you ask about C? Showing us the created C code would be more useful. — Gerhardh, Jan 24 '19 at 06:40
@hyde ah yeah, I am installing gdb now to have a look (this is your brain on insomnia) - if it is undefined though, why does it appear to be the same all the time? I guess most of what is in memory is zeros with lots of junk here and there so probabilistically you hit a zero more often than not? Obviously I could be wrong though lol - I might look at that too, just malloc some chunks and look and see what is there. — AsksStupidQuestions, Jan 24 '19 at 08:17
OK got why these two specifically behave that way. 0xc2 and 0xc3 seem to be retq. I still don't know why they return 252 though. I assume it's in the return register (is that rax?) but I don't know why. — AsksStupidQuestions, Jan 24 '19 at 08:37
The next one that is out of the pattern is 0x4cd which leads to -8. Having looked at the assembly the non segfault/bus error/sigill ones are fairly predictable (they end in retq or trap) although as the numbers get longer more stuff is going infront so maybe that will change once we get a full byte or two in front? Working out the pattern behind the bus error vs segfault is my next plan. — AsksStupidQuestions, Jan 24 '19 at 08:55
Although thinking about it does the fact it's little endian mean that it won't make a difference what comes "before" it (because it actually comes after the retq?) — AsksStupidQuestions, Jan 24 '19 at 09:12
Since you get a bus error,are you on BSD(like OS/X?) I ask because it depends on the OS(and the calling convention). On x86 systems generally the value in EAX is the return value from a ret instruction. Only the lower byte in AL will be returned to the shell. Since EAX isn't set by your code, it is whatever value was in EAX prior to the C runtime calling main. — Michael Petch, Jan 24 '19 at 11:10
This all falls into the realm of undefined behavior. But if you do `const char main[]={0xb0, 0x01, 0xc3};` the second byte is the return value. So you might be lucky enough to get 1 returned. The sequence of 3 bytes does `mov al, 1` `ret` .If you change the second byte it should alter the return code. — Michael Petch, Jan 24 '19 at 11:23
@MichaelPetch thanks, yeah I've been messing about with the values and trying to make it make something that does something (so far I've just moved stuff to registers and then returned). This is fun though, I feel like I am learning from it even though it is a monumentally silly way to go about it and I should probably just read a book! And yes MacOS. — AsksStupidQuestions, Jan 24 '19 at 12:04
@SWilliams *"if it is undefined though, why does it appear to be the same all the time?"*, well, as said, to know that, you need to look at the generated assembly. Does it behave the same every time? Note that the assembly might change if you change compiler options, or if you re-arrange code, or if you upgrade your compiler toolchain, or according to the phase of the moon, or... so looking at result of one compilation is no guarantee it will be the same after next build. — hyde, Jan 24 '19 at 12:05
*I understand that a segfault is invalid access to valid memory and a bus error is access to invalid memory* No, usually you get a segfault for trying to access an unmapped page. x86-64 Linux doesn't normally give you a bus error for anything (so you're probably on something else, like OS X). A bus error for misaligned SSE loads/stores would make sense. x86-64 Linux will SIGBUS if you turn on x86 alignment checking (the AC flag in EFLAGS): [Debugging SIGBUS on x86 Linux](https://stackoverflow.com/a/2089240). But nobody does that because library funcs like memcpy use unaligned. — Peter Cordes, Jan 24 '19 at 15:02

Peter Cordes · Accepted Answer · 2019-01-24T20:01:10.867

int main = 194 is c2 00 00 00, which decodes as ret 0

Whatever called main must have left 252 in the low byte of RAX. (The calling convention says that RAX is the return-value register, but it's not an arg-passing register so on function entry it holds whatever tmp garbage your caller was using it for.)

See the bottom of the answer for a theory on why you get SIGBUS for 2 but SIGSEGV for 3: I think RAX is a valid pointer on entry to main (by chance of what the dynamic linker had there), 03 00 add eax, [rax] destroys it but 02 00 add al, [rax] doesn't, and then execution either faults on the 00 00 add [rax], al from the next 2 bytes of main, or runs the 00 00 instruction and then falls off the end of a page.

Update from @MichaelPetch: RAX is pointing to main (in the read-only TEXT segment), and stores to read-only pages also SIGBUS. So 00 00 add [rax], al will SIGBUS for that reason if RAX is still pointing there.

(Beware that this answer has some wrong guesses and wasn't fully rewritten every time I got new info from @SWilliams or @MichaelPetch. The bullet points about what kinds of #PF cause which signal are up to date, and I've tried to at least add a correction after things that weren't quite accurate. I think there's some value to the wrong theories, as an illustration of others kinds of things that might have happened, so I'm leaving it all in here.)

Your Python program fails on my Linux machine once it gets to c2 00 00 00 ret imm16, the first one that returns successfully. (On Linux, the .rodata section ends up after .text in the TEXT segment, so there's nothing for main to fall into.)

...
const main=0xc0 : segfault
const main=0xc1 : segfault
Traceback (most recent call last):
  File "./opcode-test.py", line 34, in <module>
    print("const main=" + str(hex(i)) + " : " + r)
TypeError: must be str, not int

Doesn't python have an equivalent of strsignal(3) to map signals to standard text strings like "Illegal instruction"? (Like strerror but for signal codes instead of errno values?)

Most x86 instructions are multiple bytes long. x86 is little-endian, so you're mostly looking at
?? 00 00 00 90 90 90 ... or for larger integers ?? ?? 00 00 90 90 90 90 ..., assuming your linker fills bytes between functions with 0x90 nop like GNU ld on Linux does.

These byte sequences might decode to one or more valid instructions before you hit the NOPs and fall through to whatever CRT function the linker puts after main. If you get there without faulting, and without offsetting the stack pointer, you've entered the function with a valid return address on the stack (main's caller, another CRT function) exactly like if main tail-called it.

Presumably that function returns 252 (or some wider value whose low byte is 252). Returning from main leads to clean process exit, making an exit system call with main's return value.

This fall-through tailcall is like if main ended with return next_function(argc, argv);.

Correction (without rewriting the whole answer, sorry)

Since main=194 is the first one that worked, I think you're not actually getting fall-through, probably only C2 ret imm16 and C3 ret are leading to a clean exit. And for c2, it has to be followed by 2 00 bytes, or else it'll break the stack for main's caller.

Or those instructions with a prefix that doesn't do anything, or a harmless one-byte instruction. e.g. 90 nop / c3 ret or 90 nop / c2 00 00 ret 0. Or 91 xchg eax, ecx, etc. could actually give you a different return value, swapping EAX with another register. (x86 dedicates opcodes 90 .. 97 to xchg-with-EAX, because on original 8086 AX was more "special", without instructions like movsx to sign-extend into other registers. And without 2 operand imul.

Other harmless one-byte instructions include 99 cdq and 98 cwde, but not push or pop (because changing RSP would make it not point at the return address). Some one-byte flag set/clear instructions are f9 stc, fd std, but not fb sti (that's privileged, unlike the carry flag and direction flag).

Harmless prefixes are 0x40..4f REX prefixes, 0xf2/f3REP, and0x66and0x67` operand-size and address size. Also any segment-override prefixes might also be harmless.

I just tested main=0xc366 and main=0xc367 and yes they both exit cleanly. GDB decodes 66 c3 as retw (operand-size prefix) and 67 c3 as addr32 ret (address size prefix), but both still pop a 64-bit return address, and don't truncate the stack pointer either. (I took out the -no-pie I'd been using, so RIP was outside the low 32 bits along with RSP).

Note that 00 is the opcode for add [r/m8], r8, so 00 00 decodes as add [rax], al.

To get past those 00 bytes and get to the "nop sled" the linker inserts as padding, you need the opcode (and modrm byte if the opcode uses one) to encode the start of a longer instruction, like 0xb8 mov eax, imm32 which is 5 bytes long, and consumes the next 4 bytes after the 0xb8. In fact there are short-form mov-immediate encodings for every register, so 0xb8 + 0..7 will all get you past the gap. Except for mov esp, imm32, which will lead to a crash once you get to the next function because it stepped on the stack pointer.

One of the early ones is 05, the short-form (no modrm) opcode for add eax, imm32. Most original-8086 ALU instructions have a special AX,imm16 / EAX,imm32 short form, instead of the op r/m32, imm32 or imm8 form that uses a ModRM byte to encode the destination operand. (And the bits of the /r field in ModRM as extra opcode bits.)

See Tips for golfing in x86/x64 machine code for more about AL / EAX / RAX short form encodings, and one byte instructions.

For manually decoding x86 machine code, see Intel's manuals, especially the vol.2 manual which details the instruction encoding formats, and has an opcode table at the end. (See links in the x86 tag wiki). For just an opcode map, see http://ref.x86asm.net/coder64.html.

Use a disassembler or debugger to see what's in your executables

But really, use a disassembler like objdump -drwC -Mintel. Or llvm-objdump. Find main in the output, and look at what you get. (Or use GDB, because labels in the middle of an instruction throw off the disassembler.)

Use objdump -rwC -Mintel -D -j .rodata -j .text testc/test194 to get output like this, disassembling the .text and .rodata sections as code:

testc/test194:     file format elf64-x86-64


Disassembly of section .text:

0000000000400540 <__libc_csu_init>:
  400540:       41 57                   push   r15
  400542:       49 89 d7                mov    r15,rdx
  ...
  4005a4:       c3                      ret    
  4005a5:       90                      nop
  4005a6:       66 2e 0f 1f 84 00 00 00 00 00   nop    WORD PTR cs:[rax+rax*1+0x0]

00000000004005b0 <__libc_csu_fini>:
  4005b0:       c3                      ret    

Disassembly of section .rodata:

00000000004005c0 <_IO_stdin_used>:     ;;;; This is actually data!
  4005c0:       01 00                   add    DWORD PTR [rax],eax
  4005c2:       02 00                   add    al,BYTE PTR [rax]

00000000004005c4 <main>:
  4005c4:       c2 00 00                ret    0x0
        ...             ; objdump elided the last 0, not me.  It literally put ...

(I modified your python script to add the -no-pie gcc option, which is why my disassembly has absolute addresses, instead of just small addresses relative to the start of the file = 0. I wondered if that might put main somewhere it could fall through, but it didn't.)

Notice there's only a small gap between .text and .rodata. They're part of the same ELF segment (in the ELF program headers that the OS's program loader looks at), so they're part of the same mapping, no unmapped pages between them. If we're lucky, the intervening bytes are even filled with 0x90 nop instead of 00. Actually, something filled the gap between __libc_csu_init and __libc_csu_fini with long NOPs. Maybe that was from the assembler if they were in the same source file.

main is of course in .rodata because you declared it in C as a read-only global (static storage), like const int main = 6;. I you used const int main __attribute__((section(".text"))) = 123, you could get main in the normal .text section. On my system, it ends up right before __libc_csu_init.

But labels interrupt disassembly; the disassembler thinks it must have been wrong and restarts decoding from the label. So in GDB on testc/test5 (with set disassembly-flavor intel and layout reg, then using the start command to stop at the start of main), I'll get

   |0x40053c <main>                 add    eax,0x41000000                                                                                                 │
   │0x400541 <__libc_csu_init+1>    push   rdi                                                                                                            │
   │0x400542 <__libc_csu_init+2>    mov    r15,rdx

But from objdump -drwC -Mintel (disassembing only the .text section is the default for -d, and I used the GNU C attribute to put main there so my program could work the way yours does), I get:

000000000040053c <main>:
  40053c:       05 00 00 00                                         ....

0000000000400540 <__libc_csu_init>:
  400540:       41 57                   push   r15
  400542:       49 89 d7                mov    r15,rdx

Notice that the .... on the same line as the 05 00 00 00 indicates that decoding didn't get to the end of an instruction.

And since main isn't aligned by 16 here, it's right up against the start of __libc_csu_init. So the add eax, imm32 consumes the REX.W prefix (41) from push r15, making it decode as push rdi if reached by falling through from main instead of by a call to the __libc_csu_init label.

The above output was from Linux. Your OS X system would be different

OS X puts most of the CRT startup code in libc, not statically linked into the executable with main.

Or maybe there isn't anything for your main to fall through into

If there was, main=5 would have worked, but you say the first non-crashing result was with main=194, which is an actual ret.

If nothing before c3 ret or c2 00 00 ret 0 returned, then probably there's nothing to fall into after main, or the gap isn't padded with repeated 90 nop to form a "nop sled" that will execute ok if decoding starts anywhere in the middle of it. (e.g. after an earlier instruction consumes the trailing 0 bytes at the end of the dword int main, and some of the padding bytes.)

I understand that a segfault is invalid access to valid memory and a bus error is access to invalid memory

No, that simplified description is backwards. Usually you get a segfault for trying to access an unmapped page, on all Unixes. But you get a bus error for some kinds of invalid access (even on valid addresses).

Solaris on SPARC gives you a bus error for misaligned word loads/stores to valid memory.

On x86-64 Linux, you only get SIGBUS for really weird stuff. See Debugging SIGBUS on x86 Linux. Non-canonical stack pointer leading to a #SS exception, reading past the end of a mmaped file that was truncated. Also if you enable x86 alignment checking (AC flag), but nobody does that because library funcs like memcpy use unaligned loads/stores, and compiler code-gen assumes that unaligned integer loads/stores are safe.

IDK what hardware exceptions *BSD maps to SIGBUS, but I'd assume that regular out-of-bounds access, like NULL-pointer dereference, would SIGSEGV. That's pretty standard.

@MichaelPetch says in comments that on OS X

#PF (page fault hardware exception) from code-fetch cases the kernel to deliver SIGBUS
#PF from a data load/store to an unmapped page results in SIGSEGV.
#PF from a store to a read-only page results in SIGBUS. (And this is what's happening after 02 00 add al, [rax], in the 00 00 add [rax], al that forms the 2nd byte of main. The rest of this answer doesn't take this into account.)

(Of course this is after checking if the page-fault was due a difference between the hardware page table and the logical process memory map, e.g. from lazy mapping, copy-on-write, or pages paged out to disk.)

So if your int main is landing at the very end of an unmapped page, 05 add eax,imm32 would read one extra byte past the end of the dword holding int main (.long 5 in GAS syntax asm). That would go into the next page and SIGBUS. (Your last comment indicates it does SIGBUS.)

A theory for what's going on with the first few values:

You report:

a bus error for main = 02 00 add al, [rax] / `00 00 add [rax], al
but a segfault for main = 03 00 add eax, [rax] / 00 00 add [rax], al.

We know the low byte of RAX is 252, so if RAX holds a valid pointer value, it's 4-byte aligned. So if loading a byte from [rax] works, so does loading a dword.

So probably the memory-source add is succeeding, and modifying AL, the low byte of RAX (byte operand size) probably still leaving RAX a valid pointer.** Then if the rest of the page containing main is filled with 00 00 add [rax], al instructions (or just the one inside main itself), those will succeed (without further modifying RAX) until execution falls off into an unmapped page, as long as RAX is still a valid pointer after running whatever main decoded to.

Actually, the memory-destination add itself faults and raises SIGBUS.

03 00 add eax, [rax] writes EAX, and thus truncates RAX to 32-bit. (writing a 32-bit register implicitly zero-extends into the full 64-bit register, unlike writing low 8 or 16 partial registers.) This definitely gives you an invalid pointer, because OS X maps static code/data outside the low 32 bits of virtual address space.
So the following 00 00 add [rax], al will definitely fault from trying to write an out-of-bounds address, causing a #PF that raises SIGSEGV.

There's probably just the one 00 00 from the last two bytes of main before the end of a page. Otherwise 05 add eax, imm32 would segfault from truncating RAX and then running 00 00 add [rax], al. For it to SIGBUS, it must code-fetch into an unmapped page without decoding any memory-access instructions after that.

There are certainly other plausible explanations for what you're seeing, but I think this explains all your observations so far; without more data we can't disprove it. Obviously the easiest thing would be to fire up GDB or whatever other debugger and just start / si and watch what happens.

He already has said he is using MacOS when I made that inquiry about the bus error. His comment: _And yes MacOS_ — Michael Petch, Jan 24 '19 at 16:40
@MichaelPetch: Thanks. I don't have a Mac, so that doesn't help me improve the answer by actually seeing how its linker lays out a binary. :/ I wrote most of it thinking there would be fall-through into the next function, but now I'm thinking maybe it's just `ret` and prefix+`ret` that are returning, like I was getting before I used an attribute to put `main` in the .text section. — Peter Cordes, Jan 24 '19 at 16:48
That was really informative thanks. I don't have enough rep to vote but yeah enjoyed reading that. More to explore now. Should try playing around with some things in a linux vm too and look at differences. — AsksStupidQuestions, Jan 24 '19 at 17:53
In terms of this test code. The reason for the bus errors on MacOS is because the constant `main` is placed at the **end** of the read only page (starting 4 bytes before the 4kb boundary). Anything that doesn't decode into a complete instruction will run into the next page which is not executable. If a complete instruction can be processed and it happens to do a memory access that doesn't have privilege to do so it will segfault. — Michael Petch, Jan 24 '19 at 18:08
@MichaelPetch: So MacOS sends SIGBUS instead of SIGSEGV for a page fault due to code-fetch instead of data load/store? Or does it always SIGBUS for `#PF` on unmapped pages (in cases where it wasn't just lazy mapping)? — Peter Cordes, Jan 24 '19 at 18:14
@MichaelPetch: The OP reports a bus error for `02 00 add al, [rax]`, but a segfault for `03 00 add eax, [rax]`. Possibly RAX was *pointing* near the end of a page, allowing the byte load to succeed? No, but `252 = 256-4`, so it's 4-byte aligned. Maybe it was the `00 00` instruction following that faulted? The OP doesn't mention `05 add eax,imm32` (which would consume a 5th byte after `int main`). — Peter Cordes, Jan 24 '19 at 18:21
That was bus error. Most results are. First 10 segfaults are const ```main=0x3 : segfault const main=0xb : segfault const main=0x13 : segfault const main=0x1b : segfault const main=0x23 : segfault const main=0x2b : segfault const main=0x33 : segfault const main=0x41 : segfault const main=0x43 : segfault const main=0x45 : segfault``` — AsksStupidQuestions, Jan 24 '19 at 18:37
@SWilliams: Then I think MichaelPetch is right that your `main` is ending up at the very end of a page, followed by an unmapped page. I added a section that explains why `02` SIGBUSes while `03` SIGSEGVs. Padding with `00` bytes after `main` until an unmapped page could have explained that, but `05 add eax, imm32` getting SIGBUS instead of SIGSEGV rules that out. Use a debugger and single-step from the start of `main` if you want to see it in action. Debuggers are vital tools for learning asm, and for checking guesswork by veterans :) — Peter Cordes, Jan 24 '19 at 19:03
@MichaelPetch: thanks, I think everything makes sense now with `main` at the end of a page: If RAX was a valid pointer, `add eax, [rax]` destroys it but `add al, [rax]` doesn't. Leading to `00 00 add [rax], al` faulting or falling through. I think you deleted your comment confirming that OS X does SIGBUS for code fetch vs. SIGSEGV for data. I updated my answer with details. — Peter Cordes, Jan 24 '19 at 19:05
Ok, gonna have a go with lldb (tried to use gdb and it was all kinds of hoops to jump through to let it actually work on macOS because of SIP :/) — AsksStupidQuestions, Jan 24 '19 at 19:27
SIGBUS will occur when there is a page fault in a code fetch, but in MacOS you also get a SIGBUS when writing to read only memory. Writing/Reading from/to an unmapped page gives SIGSEV. When using `const main=2` it SIGBUSes because the value in RAX just so happens to be a pointer to main (0x100000FFC) ;-).So in this sequence `add al,BYTE PTR [rax]` `add BYTE PTR [rax],al` the second instruction actually attempts to overwrite the code (itself) in a read only segment and gives SIGBUS. — Michael Petch, Jan 24 '19 at 19:41
When using `const main=3` you end up with `add eax,DWORD PTR [rax]` (which computes to 0xFFF) followed by `add BYTE PTR [rax],al` which then tries to write to the unmapped page at the beginning of memory so it SIGSEVs — Michael Petch, Jan 24 '19 at 19:41