10

In an assembly program, the .text section is loaded at 0x08048000; the .data and the .bss section comes after that.

What would happen if I don't put an exit syscall in the .text section? Would it lead to the .data and the .bss section being interpreted as code causing "unpredictable" behavior? When will the program terminate -- probably after every "instruction" is executed?

I can easily write a program without the exit syscall, but testing if .data and .bss gets executed is something I don't know because I guess I would have to know the real machine code that is generated under-the-hoods to understand that.

I think this question is more about "How would OS and CPU handle such a scenario?" than assembly language, but it is still interesting to know for assembly programmers etc.

Nishant
  • 20,354
  • 18
  • 69
  • 101
  • 7
    Execution would continue into whatever is after your code, yes. That will probably hit an invalid instruction sooner or later, or you will run into unmapped memory. If you are extremely lucky, you might hit a harmless endless loop in which case your program wouldn't terminate. – Jester Apr 05 '18 at 13:52
  • @Jester I'd say more chances of winning the lottery than hitting an endless loop. – Tony Tannous Apr 05 '18 at 21:33
  • @TonyTannous I did say "extremely lucky" :D However, you can make an endless loop in x86 using 2 bytes and assuming random memory contents that's already way better chance than any lottery I know. Unfortunately you are likely to hit some zero bytes instead of random, and that is `add al, [eax]` on x86 which will probably fault. – Jester Apr 05 '18 at 22:10
  • 2
    For the record, `00 00` decodes as `add [eax], al`: memory *destination*, not memory source, so EAX (or RAX in 64-bit code) has to be pointing at writeable memory, but repeated execution doesn't change the low byte of the address. – Peter Cordes Apr 06 '18 at 02:58
  • Related: [Nasm segmentation fault on RET in \_start](https://stackoverflow.com/q/19760002) - `_start` isn't a function, there's nothing to return to. You need to make an exit system-call. That Q&A has actual code examples for x86 / x86-64. – Peter Cordes Mar 16 '22 at 17:23

1 Answers1

22

The processor does not know where your code ends. It faithfully executes one instruction after another until execution is redirected elsewhere (e.g. by a jump, call, interrupt, system call, or similar).

If your code ends without jumping elsewhere, the processor continues executing whatever is in memory after your code. It is fairly unpredictable what exactly happens, but eventually, your code typically crashes because it tries to execute an invalid instruction or tries to access memory that it is not allowed to access.

If neither happens and no jump occurs, eventually the processor tries to execute unmapped memory or memory that is marked as “not executable” as code, causing a segmentation violation. On Linux, this raises a SIGSEGV or SIGBUS. When unhandled, these terminate your process and optionally produce core dumps.

If you're curious, run under a debugger and look at disassembly of the faulting instruction.

fuz
  • 88,405
  • 25
  • 200
  • 352
  • 2
    At the CPU level, the exception you're talking about is a page-fault (`#PF`), not segment-related. A 32-bit or 64-bit process on x86 Linux runs with CS base=0 / limit = unlimited. The "segmentation" in `SIGSEGV` has basically nothing to do with x86 segments, because that's not what x86 Linux uses for memory protection. – Peter Cordes Apr 06 '18 at 03:00
  • @PeterCordes I would prefer if you removed all mention of CPU-specific details as otherwise readers will be confused if this applies to x86 only or also to other architectures. I had intentionally written the answer without such references to avoid this uncertainty. – fuz Jan 13 '23 at 01:59
  • Maybe I should post a separate answer with x86 details? This is a nice question for a canonical duplicate, but the answer doesn't show how to actually fix the problem for those who don't know. That was the compromise I was trying to strike. (I'd typed the `00 00` part before noticing that the question wasn't x86 specific; maybe I should have taken it out. My comment from 4 years ago is also x86-specific, for better or for worse :/) The `0x08048000` .text address in the question is what `ld` uses for 32-bit x86 Linux non-PIE, but it might also use the same address on others. – Peter Cordes Jan 13 '23 at 02:02
  • Most of the other Q&As with examples of how to `_exit` are about attempts to `ret` from `_start`, not falling off the end, so they don't work as well as a single duplicate for questions like [Assembly-segmentation fault](https://stackoverflow.com/q/28017926) where that appears to be the problem (although that one has some confusing text in the question which implies otherwise, but isn't a MCVE of it.) – Peter Cordes Jan 13 '23 at 02:04
  • 1
    This is of course your answer, so it's your call what it says. Let me know if you decide to take out the x86 stuff; if I don't find a better canonical for x86 Q&As that fall off the end of `_start`, I might just add a separate answer here. Maybe also with some mention of the equivalents for ARM and AArch64 Linux if that isn't too cluttered. – Peter Cordes Jan 13 '23 at 06:51
  • @PeterCordes Yes, please do. – fuz Jan 13 '23 at 10:03