9

I'm aware that you can read past the end of an array - I'm wondering now if you can seg-fault just by performing that reading operation though.

int someints[100];
std::cerr << someints[100] << std::endl; //This is 1 past the end of the array.

Can the second line actually cause a seg-fault or will it just print jibberish? Also, if I changed that memory, can that cause a seg-fault on that specific line, or would a fault only happen later when something else tried to use that accidentally changed memory?

John Humphreys
  • 37,047
  • 37
  • 155
  • 255
  • 4
    It is more likely to print gibberish than to core dump, but the behaviour is undefined and it could simply reformat the entire disk (and there'd be no cause for complaint - you invoked undefined behaviour). Don't try it. And absolutely do not rely on mild abuse; you don't know which variable did get written on instead. – Jonathan Leffler Aug 31 '11 at 17:44

3 Answers3

10

This is undefined behaviour and entirely depends on the virtual memory layout the operating system has arranged for the process. Generally you can either:

  • access some gibberish that belongs to your virtual address space but has a meaningless value, or
  • attempt to access a restricted memory address in which case the memory mapping hardware invokes a page fault and the OS decides whether to spank your process or allocate more memory.

If someints is an array on the stack and is the last variable declared, you will most likely get some gibberish off the top of the stack or (very unlikely) invoke a page fault that could either let the OS resize the stack or kill your process with a SIGSEGV.

Imagine you declare a single int right after your array:

int someints[100];
int on_top_of_stack = 42;
std::cerr << someints[100] << std::endl;

Then most likely the program should print 42, unless the compiler somehow rearranges the order of declarations on the stack.

Blagovest Buyukliev
  • 42,498
  • 14
  • 94
  • 130
  • 3
    I just got 42 on my system, but there's no particular reason to expect that `on_top_of_stack` will be allocated immediately after `someints`. For example, if small offsets are cheaper, there might be a good reason to put smaller objects at the beginning of the stack frame. – Keith Thompson Aug 31 '11 at 17:54
  • 2
    If you turn the optimizer on, all bets are off. The order of declarations is unrelated to the order in memory, except of fields within a structure. – Dietrich Epp Aug 31 '11 at 18:14
  • @Keith and Dietrich: this is why I say "most likely" and not "always". – Blagovest Buyukliev Aug 31 '11 at 18:16
  • @Blagovest: You may well be right. Confirming one way or the other would require testing on a variety of compilers. Without having done that testing, I honestly had no particular expectation; it's very likely that `someints` and `on_top_of_stack` would be adjacent, but they could just as well be in the opposite order (and I can imagine valid reasons for doing it that way). In any case, I'm sure we can both agree that it's neither wise nor necessary to make any assumptions. – Keith Thompson Aug 31 '11 at 18:33
  • 1
    @Keith: you can't depend on variable ordering at all (and it's possible that `on_top_of_stack` might not even be allocated anything but a register, especially in an optimized build). Also, see this answer - http://stackoverflow.com/questions/4575697/unexpected-output-from-bubblesort-program-with-msvc-vs-tcc/4577565#4577565 - for an example of when MSVC changed the order of variable allocation simply because the *name* of a variable changed - even in a non-optimized build. I was a bit surprised. – Michael Burr Sep 01 '11 at 00:33
4

Yes, it can segfault if memory at that address is not accessible by the program. In your case it is not likely as array is allocated on stack and is only 100 bytes long and stack size is significantly larger (i.e. 8 MB per thread on Linux 2.4.X), so there will be uninitialized data. But in some cases it may crash. In either case, this code is erroneous and profilers like Valgrind should be able to help you troubleshoot it.

  • Closer. You too used the example of a unix-like machine, where the stack is at the top (highest address of the problem) of virtual memory and grows down. This one-past-the-end access will bump into something else on the stack, and there is always going to be something to bump into on a unix machine. At a minimum, the argument count is at the very top of the stack. – David Hammen Aug 31 '11 at 21:15
2

The second line can cause literally anything to happen and still be correct as far as the language specification is concerned. It could print gibberish, it could crash due to a segmentation fault or something else, it could cause power to go out on the entire eastern seaboard, or it could cause the canonical demons to fly out of your nose...

That's the magic of undefined behaviour.

Carl Norum
  • 219,201
  • 40
  • 422
  • 469
  • 1
    +1: This is one of the few correct answers. Accessing `someints[100]` is undefined behavior. (Aside: Forming a pointer to `someints[100]` is not undefined behavior; it is explicitly allowed behavior. Think `std::vector::end()`.) – David Hammen Aug 31 '11 at 21:20