74

I think the question says it all. An example covering most standards from C89 to C11 would be helpful. I though of this one, but I guess it is just undefined behaviour:

#include <stdio.h>

int main( int argc, char* argv[] )
{
  const char *s = NULL;
  printf( "%c\n", s[0] );
  return 0;
}

EDIT:

As some votes requested clarification: I wanted to have a program with an usual programming error (the simplest I could think of was an segfault), that is guaranteed (by standard) to abort. This is a bit different to the minimal segfault question, which don't care about this insurance.

math
  • 8,514
  • 10
  • 53
  • 61
  • 14
    Your code would not produce a segmentation fault on platforms that allow you to successfully dereference a NULL pointer. Not portable. – fuz Sep 24 '13 at 16:06
  • possible duplicate of [Minimal C/C++ program that segfaults?](http://stackoverflow.com/questions/12404759/minimal-c-c-program-that-segfaults) – BlueRaja - Danny Pflughoeft Sep 24 '13 at 19:14
  • 2
    @BlueRaja-DannyPflughoeft It is not a dup since this question specifically asks for the solution to conform to the standard which the dup does not. The dup of the proposed dup is actually a *C++* question which is just silly. – Shafik Yaghmour Sep 24 '13 at 19:28
  • 2
    I am little baffled as to why people are voting to close this question. I don't see how the questions can be unclear when there are several answers that cleave pretty close to each other in content and the readers based on their votes don't seem to be confused. The too broad vote is weird considering the answers given and I explained already why it is not a dup. – Shafik Yaghmour Sep 24 '13 at 21:53
  • 5
    `abort()` is the only thing guaranteed to abort. – OrangeDog Sep 25 '13 at 10:27

11 Answers11

115

raise() can be used to raise a segfault:

raise(SIGSEGV);
devnull
  • 118,548
  • 33
  • 236
  • 227
msam
  • 4,259
  • 3
  • 19
  • 32
  • 8
    As far as I can tell this is still implementation defined i.e. the standard does not define the exact behavior of this section `7.14.2.1 The raise function` points to `7.14.1.1` which does not talk about anything related to segmentation faults. – Shafik Yaghmour Sep 24 '13 at 16:12
  • 11
    Uhm, this doesn't produce a segmentation fault. It just raises the SIGSEGV signal :-/ – Nikos C. Sep 24 '13 at 16:13
  • 1
    A program throws segmentation fault when it receives a `SIGSEGV` signal. So.. If `raise(SIGSEGV)` sends the current process a `SIGSEGV` it should throw segmentation fault. – Marco Sep 24 '13 at 16:19
  • 1
    however, that may depend on the operating system and may not be standard. I mean.. you can write an OS that doesn't have segmentation faults :P. But this answer is a standard way to tell the OS that a Segfault occurred without relying on undefined behavior. – Marco Sep 24 '13 at 16:27
  • Detail: `SIGSEGV` is `an invalid access to storage`, not necessarily a segmentation fault. C11 7.14. Example: On a system that required even aligned memory access may raise this signal with an odd address, even if it has no memory segments. In any case, I think this answer best meets OP's goal. – chux - Reinstate Monica Sep 24 '13 at 16:58
  • 2
    @chux: How do you define segmentation fault? – aschepler Sep 24 '13 at 17:15
  • @aschepler "segmentation fault" has grown to be a general purpose term expressing all sorts of invalid memory accessing. I'm use to "segmentation fault" involving an invalid paging, mode (writing to read-only memory), or range error (address too high or 0) - something that works/fails on a segment or range of memory - rarely is this recoverable. I tend to think of "bus fault" as occurring on mis-aligned memory access. Some bus faults are recoverable. A faulty memory (parity, ECC failure) is another type of invalid access - sometimes recoverable, though not necessarily a segmentation issue. – chux - Reinstate Monica Sep 24 '13 at 17:32
  • 18
    @Marco Segfaults are detected by the kernel. They happen. Throwing a signal just instructs the system to play as-if. A segfault didn't really happen, but the system treats it as if it did happen. A segfault does not happen just because the SIGSEGV signal is raised. A segfault only happens when memory is accessed that the process isn't allowed to access. No such invalid memory access is happening by calling `raise(SIGSEGV)`. To give you a real life analogy, if in soccer you increase the score of a team by 1 without a goal having been scored does not mean that a goal was scored. – Nikos C. Sep 24 '13 at 18:32
  • 5
    Segfaults are usually detected by the CPU (MMU in particular), not the kernel. In particular, not a single instruction of kernel code is executed to detect them. The CPU will of course jump to kernel code to handle the segfault. `raise(SIGSEGV)` jumps to the kernel to handle `SIGSEGV`. That is rather comparable. – MSalters Sep 25 '13 at 07:39
  • 1
    It has already been said, but this does indeed not equal a true segfault, it merely calls any SIGSEGV handler. In theory (7.14.1.1) this should be the same thing but in practice it isn't. Why? Because the behavior is undefined if and when the handler returns. Many implementations would simply restart the instruction that caused the segfault in the first place, but not if `raise` was used. In that case the handler is simply executed and after the handler of the raised signal returns (7.14.2.1) the program simply continues normally. The restart leads to an infinite loop on real segfaults. – stefanct Oct 27 '15 at 23:23
77

A segmentation fault is an implementation defined behavior. The standard does not define how the implementation should deal with undefined behavior and in fact the implementation could optimize out undefined behavior and still be compliant. To be clear, implementation defined behavior is behavior which is not specified by the standard but the implementation should document. Undefined behavior is code that is non-portable or erroneous and whose behavior is unpredictable and therefore can not be relied on.

If we look at the C99 draft standard §3.4.3 undefined behavior which comes under the Terms, definitions and symbols section in paragraph 1 it says (emphasis mine going forward):

behavior, upon use of a nonportable or erroneous program construct or of erroneous data, for which this International Standard imposes no requirements

and in paragraph 2 says:

NOTE Possible undefined behavior ranges from ignoring the situation completely with unpredictable results, to behaving during translation or program execution in a documented manner characteristic of the environment (with or without the issuance of a diagnostic message), to terminating a translation or execution (with the issuance of a diagnostic message).

If, on the other hand, you simply want a method defined in the standard that will cause a segmentation fault on most Unix-like systems then raise(SIGSEGV) should accomplish that goal. Although, strictly speaking, SIGSEGV is defined as follows:

SIGSEGV an invalid access to storage

and §7.14 Signal handling <signal.h> says:

An implementation need not generate any of these signals, except as a result of explicit calls to the raise function. Additional signals and pointers to undeclarable functions, with macro definitions beginning, respectively, with the letters SIG and an uppercase letter or with SIG_ and an uppercase letter,219) may also be specified by the implementation. The complete set of signals, their semantics, and their default handling is implementation-defined; all signal numbers shall be positive.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • 1
    Although, msam's answer provides the exact solution, this answer gave me the most insight. And now with the edit, also mentioning the raise-possibility I think it deserves acceptance. Nonetheless thanks to all contributors opening my eyes on this issue. – math Sep 24 '13 at 17:30
  • you said "in fact the implementation could optimize out undefined behavior and still be compliant.". So, double delete is undefined in C++. So, is it possible for a C++ implementation to optimize it out & still be compliant? – Destructor Oct 08 '15 at 04:07
  • 1
    @PravasiMeet once there is undefined behavior the compiler is allowed to do anything. – Shafik Yaghmour Oct 08 '15 at 09:26
  • @ShafikYaghmour: so you mean to say that the thing I wrote in my comment is also possible. – Destructor Oct 08 '15 at 10:39
  • @PravasiMeet can you point me to a specific example, either an SO question or a live demo? As far as I can tell the answer is yes but talking in abstract is always prone to missing important details. My [answer here](http://stackoverflow.com/a/31746063/1708801) provides a perfect example of UB and optimization and I provide a lot of links to articles explaining these concepts in great detail. My [answer here](http://stackoverflow.com/a/32507135/1708801) shows and extreme example of UB and optimizations and demonstrates how surprising the results can be. – Shafik Yaghmour Oct 08 '15 at 13:06
  • @PravasiMeet also see [Does “Undefined Behavior” really permit *anything* to happen?](http://stackoverflow.com/q/32132574/1708801) – Shafik Yaghmour Oct 08 '15 at 13:24
33

The standard only mentions undefined behavior. It knows nothing about memory segmentation. Also note that the code that produces the error is not standard-conformant. Your code cannot invoke undefined behavior and be standard conformant at the same time.

Nonetheless, the shortest way to produce a segmentation fault on architectures that do generate such faults would be:

int main()
{
    *(int*)0 = 0;
}

Why is this sure to produce a segfault? Because access to memory address 0 is always trapped by the system; it can never be a valid access (at least not by userspace code.)

Note of course that not all architectures work the same way. On some of them, the above could not crash at all, but rather produce other kinds of errors. Or the statement could be perfectly fine, even, and memory location 0 is accessible just fine. Which is one of the reasons why the standard doesn't actually define what happens.

Nikos C.
  • 50,738
  • 9
  • 71
  • 96
  • 10
    I've used embedded systems programmed in C where the memory at address 0 is not only present, it must be written. That is a common location for the table of interrupt vectors, for instance. It still feels really, really, wrong to write something like `((unsigned long *)0)[1] = (unsigned long)main;` though. – RBerteig Sep 24 '13 at 19:03
  • 68000 series CPUs could address memory location zero, reading from it was ok, writing to it could cause unpredictable behavior. – Michael Shopsin Sep 24 '13 at 19:31
  • @RBerteig The conversion to pointer of `0` does not have to be the address zero. On the other hand, the conversion to pointer of `0` has to be an address that is different from the address of any object, and you are only supposed to write to objects in standard C. – Pascal Cuoq Sep 24 '13 at 20:07
  • 4
    Upvoted for “Your code cannot invoke undefined behavior and be standard conformant at the same time”, but `*(volatile int *)0` is IMHO a safer bet. – Pascal Cuoq Sep 24 '13 at 20:09
  • 4
    Embedded systems folks historically have taken a very pragmatic view of standards. What matters most is the specific implementation, and on small CPUs, the implementation is usually the most natural mapping of hardware to language. That is, after all, ingrained in the origins of C. And writing to bare metal is *very* different from a hosted environment with a full library and expected standards compliance and portability. – RBerteig Sep 24 '13 at 20:35
  • 2
    @MichaelShopsin: At least on some 68k systems, writing to address 0 is/was also supported. For example, the Commodore Amiga kernel ("exec") would write 0x48454C50 ("HELP" in ASCII) to address 0 before rebooting if it found itself so badly messed up that it couldn't even bring up an error message (the famous "Guru Meditation" box). The ROM boot code would then check for this magic number, and show the error message at that point. Admittedly, all this was (normally) done in kernel code written in assembly, but at least on the low-end Amigas with no MMU, in principle any program _could_ do it. – Ilmari Karonen Sep 25 '13 at 12:29
  • 1
    @RBerteig: Care however has to be taken because compilers (e.g. gcc) often assume null cannot be dereferenced without terminating the program and optimize on this assumption. So in environments that allow dereferencing null pointer the optimization must be turned off. – Jan Hudec Sep 25 '13 at 12:59
  • What would happen if you tried `((unsigned long *)-1)[1] = (unsigned long)main;`? – penguat Sep 25 '13 at 13:14
  • 1
    @IlmariKaronen MacOS (System 1-9) would sometimes write to address 0 so Apple warned you not to use it as you could cause the system to crash. I don't think the actual use of address 0 was documented, just warnings not to use it. In general MacOS used real memory addresses so you could always read/write to any address. – Michael Shopsin Sep 25 '13 at 13:49
14

A correct program doesn't produce a segfault. And you cannot describe deterministic behaviour of an incorrect program.

A "segmentation fault" is a thing that an x86 CPU does. You get it by attempting to reference memory in an incorrect way. It can also refer to a situation where memory access causes a page fault (i.e. trying to access memory that's not loaded into the page tables) and the OS decides that you had no right to request that memory. To trigger those conditions, you need to program directly for your OS and your hardware. It is nothing that is specified by the C language.

Kerrek SB
  • 464,522
  • 92
  • 875
  • 1,084
  • 6
    A "segmentation fault" is a thing that nearly any CPU can throw. Actually it could be the memory management hardware that has a fit. As a person that works on SPARC systems all day long I can tell you Solaris on SPARC is happy to throw a segfault at you. – paul lanken Sep 24 '13 at 16:22
  • @paullanken: Sure. I was just speaking for an architecture I'm familiar with. I imagine it would be a rather common concept. – Kerrek SB Sep 24 '13 at 17:01
  • 2
    You're describing a page fault, not a segmentation fault. They are very different. – OrangeDog Sep 25 '13 at 08:57
  • @OrangeDog: You're right. Many kinds of fault can occur as a result of a memory reference. The OS may respond to an unresolvable page fault by sending a SIGSEGV signal, but you can also cause other, more severe faults in the CPU (e.g. by getting the segmentation semantics wrong). I've edited the text. – Kerrek SB Sep 25 '13 at 09:10
  • 2
    A segmentation fault is the error whereby you try to access a memory segment that you aren't allowed to. It has nothing to do with x86 CPUs and nothing to do with page faults. – OrangeDog Sep 25 '13 at 10:26
  • 1
    @OrangeDog: Well, x86 CPUs are an example that provide the semantics of segmentation faults. I didn't claim they were the *only* hardware to do so. I agree that segmentation and page faults are unrelated things, but the OS may translate them into the same signal to be delivered to the proces, which I think is what the OP is looking for. But please do post your own answer, since you have very good points. – Kerrek SB Sep 25 '13 at 11:05
  • 1
    If an OS ever treated page faults and segmentation faults as the same signal, almost every application would crash within seconds. – OrangeDog Sep 25 '13 at 14:29
  • 1
    @OrangeDog: Only page faults that are not warranted, i.e. not known to the OS to be legit. Like dereferncing a random address. – Kerrek SB Sep 25 '13 at 14:43
  • 1
    No, when you dereference a random address, sometimes this address is in a segment that you're not allowed to access, and you get a segmentation fault. Whether a page fault occurs before the access check or not is irrelevant. – OrangeDog Sep 25 '13 at 16:17
  • @OrangeDog: isn't it that on x86 with flat memory model you can never fall into a bad segment? There's only segment zero. – Kerrek SB Sep 25 '13 at 22:11
9

If we assume we are not raising a signal calling raise, segmentation fault is likely to come from undefined behavior. Undefined behavior is undefined and a compiler is free to refuse to translate so no answer with undefined is guaranteed to fail on all implementations. Moreover a program which invokes undefined behavior is an erroneous program.

But this one is the shortest I can get that segfault on my system:

main(){main();}

(I compile with gcc and -std=c89 -O0).

And by the way, does this program really invokes undefined bevahior?

ouah
  • 142,963
  • 15
  • 272
  • 331
  • 4
    C99 6.5.2.2p11 requires support for recursion, but nowhere in the standard is there any mention of any limit on the depth of the call stack (fun fact: the word "stack" is never used in C99). The C committee surely did not intend to require all conforming implementations to provide *unlimited* call stack depth, so we're left with section 4 paragraph 2 "undefined behavior is otherwise indicated ... by the omission of any explicit definition of behavior." In other words: it's undefined, but not *explicitly* undefined. – zwol Nov 07 '15 at 19:42
5
 main;

That's it.

Really.

Essentially, what this does is it defines main as a variable. In C, variables and functions are both symbols -- pointers in memory, so the compiler does not distinguish them, and this code does not throw an error.

However, the problem rests in how the system runs executables. In a nutshell, the C standard requires that all C executables have an environment-preparing entrypoint built into them, which basically boils down to "call main".

In this particular case, however, main is a variable, so it is placed in a non-executable section of memory called .bss, intended for variables (as opposed to .text for the code). Trying to execute code in .bss violates its specific segmentation, so the system throws a segmentation fault.

To illustrate, here's (part of) an objdump of the resulting file:

# (unimportant)

Disassembly of section .text:

0000000000001020 <_start>:
    1020:   f3 0f 1e fa             endbr64 
    1024:   31 ed                   xor    %ebp,%ebp
    1026:   49 89 d1                mov    %rdx,%r9
    1029:   5e                      pop    %rsi
    102a:   48 89 e2                mov    %rsp,%rdx
    102d:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
    1031:   50                      push   %rax
    1032:   54                      push   %rsp
    1033:   4c 8d 05 56 01 00 00    lea    0x156(%rip),%r8        # 1190 <__libc_csu_fini>
    103a:   48 8d 0d df 00 00 00    lea    0xdf(%rip),%rcx        # 1120 <__libc_csu_init>

    # This is where the program should call main
    1041:   48 8d 3d e4 2f 00 00    lea    0x2fe4(%rip),%rdi      # 402c <main> 
    1048:   ff 15 92 2f 00 00       callq  *0x2f92(%rip)          # 3fe0 <__libc_start_main@GLIBC_2.2.5>
    104e:   f4                      hlt    
    104f:   90                      nop

# (nice things we still don't care about)

Disassembly of section .data:

0000000000004018 <__data_start>:
    ...

0000000000004020 <__dso_handle>:
    4020:   20 40 00                and    %al,0x0(%rax)
    4023:   00 00                   add    %al,(%rax)
    4025:   00 00                   add    %al,(%rax)
    ...

Disassembly of section .bss:

0000000000004028 <__bss_start>:
    4028:   00 00                   add    %al,(%rax)
    ...

# main is in .bss (variables) instead of .text (code)

000000000000402c <main>:
    402c:   00 00                   add    %al,(%rax)
    ...

# aaand that's it! 

PS: This won't work if you compile to a flat executable. Instead, you will cause undefined behaviour.

TheSola10
  • 697
  • 6
  • 16
2

On some platforms, a standard-conforming C program can fail with a segmentation fault if it requests too many resources from the system. For instance, allocating a large object with malloc can appear to succeed, but later, when the object is accessed, it will crash.

Note that such a program is not strictly conforming; programs which meet that definition have to stay within each of the minimum implementation limits.

A standard-conforming C program cannot produce a segmentation fault otherwise, because the only other ways are via undefined behavior.

The SIGSEGV signal can be raised explicitly, but there is no SIGSEGV symbol in the standard C library.

(In this answer, "standard-conforming" means: "Uses only the features described in some version of the ISO C standard, avoiding unspecified, implementation-defined or undefined behavior, but not necessarily confined to the minimum implementation limits.")

Kaz
  • 55,781
  • 9
  • 100
  • 149
  • 1
    `SIGSEGV` *is* specified as a macro defined in `signal.h` expanding to a positive `int` in C99 (7.14/3) – stefanct Oct 27 '15 at 23:30
2

The simplest form considering the smallest number of characters is:

++*(int*)0;
1

Most of the answers to this question are talking around the key point, which is: The C standard does not include the concept of a segmentation fault. (Since C99 it includes the signal number SIGSEGV, but it does not define any circumstance where that signal is delivered, other than raise(SIGSEGV), which as discussed in other answers doesn't count.)

Therefore, there is no "strictly conforming" program (i.e. program that uses only constructs whose behavior is fully defined by the C standard, alone) that is guaranteed to cause a segmentation fault.

Segmentation faults are defined by a different standard, POSIX. This program is guaranteed to provoke either a segmentation fault, or the functionally equivalent "bus error" (SIGBUS), on any system that is fully conforming with POSIX.1-2008 including the Memory Protection and Advanced Realtime options, provided that the calls to sysconf, posix_memalign and mprotect succeed. My reading of C99 is that this program has implementation-defined (not undefined!) behavior considering only that standard, and therefore it is conforming but not strictly conforming.

#define _XOPEN_SOURCE 700
#include <sys/mman.h>
#include <unistd.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <errno.h>

int main(void)
{
    size_t pagesize = sysconf(_SC_PAGESIZE);
    if (pagesize == (size_t)-1) {
        fprintf(stderr, "sysconf: %s\n", strerror(errno));
        return 1;
    }
    void *page;
    int err = posix_memalign(&page, pagesize, pagesize);
    if (err || !page) {
        fprintf(stderr, "posix_memalign: %s\n", strerror(err));
        return 1;
    }
    if (mprotect(page, pagesize, PROT_NONE)) {
        fprintf(stderr, "mprotect: %s\n", strerror(errno));
        return 1;
    }
    *(long *)page = 0xDEADBEEF;
    return 0;
}
zwol
  • 135,547
  • 38
  • 252
  • 361
1

It's hard to define a method to segmentation fault a program on undefined platforms. A segmentation fault is a loose term that is not defined for all platforms (eg. simple small computers).

Considering only the operating systems that support processes, processes can receive notification that a segmentation fault occurred.

Further, limiting operating systems to 'unix like' OSes, a reliable method for a process to receive a SIGSEGV signal is kill(getpid(),SIGSEGV)

As is the case in most cross platform problems, each platform may (an usually does) have a different definition of seg-faulting.

But to be practical, current mac, lin and win OSes will segfault on

*(int*)0 = 0;

Further, it's not bad behaviour to cause a segfault. Some implementations of assert() cause a SIGSEGV signal which might produce a core file. Very useful when you need to autopsy.

What's worse than causing a segfault is hiding it:

try
{
     anyfunc();
}
catch (...) 
{
     printf("?\n");
}

which hides the origin of an error and all you've got to go on is:

?

.

effbiae
  • 1,087
  • 1
  • 7
  • 22
  • +1 just for the last point itself. Just as a note: you don't necessarily need to call getpid() because if you pass -1 to kill() it's the same thing; well technically: 'If pid is -1, sig shall be sent to all processes (excluding an unspecified set of system processes) for which the process has permission to send that signal.' But for all the uses I've used it -1 works fine (*but my use cases of course doesn't equate to all use cases*). – Pryftan Jan 20 '18 at 20:54
0

Here's another way I haven't seen mentioned here:

int main() {
    void (*f)(void);
    f();
}

In this case f is an uninitialized function pointer, which causes a segmentation fault when you try to call it.

Tobias Bergkvist
  • 1,751
  • 16
  • 20