19

A program accessing illegal pointer to pointer does not crash with SIGSEGV. This is not a good thing, but I’m wondering how this could be and how the process survived for many days in production. It is bewildering to me.

I have given this program a go in Windows, Linux, OpenVMS, and Mac OS and they have never complained.

#include <stdio.h>
#include <string.h>

void printx(void *rec) { // I know this should have been a **
    char str[1000];
    memcpy(str, rec, 1000);
    printf("%*.s\n", 1000, str);
    printf("Whoa..!! I have not crashed yet :-P");
}

int main(int argc, char **argv) {
    void *x = 0; // you could also say void *x = (void *)10;
    printx(&x);
}
TRiG
  • 10,148
  • 7
  • 57
  • 107
pavan.mankala
  • 249
  • 1
  • 10
  • 10
    This is undefined behaviour, so not crashing is a perfectly fine result. Use a proper memory checking tool if you want to debug this sort of thing. – Dave Jul 25 '13 at 07:59
  • It just that I have passed a pointer to pointer and when the memcpy tries to dereference the pointer in printx() function and tries to copy some garbage 1000 bytes, it should have crashed – pavan.mankala Jul 25 '13 at 08:00
  • Memory checker such as valgrind try to report those kind of things. Otherwise there is no waranty. – hivert Jul 25 '13 at 08:11
  • See: [Hotel](http://stackoverflow.com/a/6445794/209139). – TRiG Jul 25 '13 at 12:27

5 Answers5

29

I am not surprised by the lack of a memory fault. The program is not dereferencing an uninitialized pointer. Instead, it is copying and printing the contents of memory beginning at a pointer variable, and the 996 (or 992) bytes beyond it.

Since the pointer is a stack variable, it is printing memory near the top of stack for a ways down. That memory contains the stack frame of main(): possibly some saved register values, a count of program arguments, a pointer to the program arguments, a pointer to a list of environment variables, and a saved instruction register for main() to return, usually in the C runtime library startup code. In all implementations I have investigated, the stack frames below that has copies of the environment variables themselves, an array of pointers to them, and an array of pointers to the program arguments. In Unix environments (which you hint you are using) the program argument strings will be below that.

All of this memory is "safe" to print, except some non-printable characters will appear which might mess up a display terminal.

The chief potential problem is whether there is enough stack memory allocated and mapped to prevent a SIGSEGV during access. A segment fault could happen if there is too little environment data. Or if the implementation puts that data elsewhere so that there are only a few words of stack here. I suggest confirming that by cleaning out the environment variables and re-running the program.

This code would not be so harmless if any of the C runtime conventions are not true:

  • The architecture uses a stack
  • A local variable (void *x) is allocated on the stack
  • The stack grows toward lower numbered memory
  • Parameters are passed on the stack
  • Whether main() is called with arguments. (Some light duty environments, like embedded processors, invoke main() without parameters.)

In all mainstream modern implementations, all of these are generally true.

wallyk
  • 56,922
  • 16
  • 83
  • 148
  • 1
    @pavan.mankala: You are welcome. Indeed I have dabbled writing and maintaining several compilers, and spent quite a bit of time dealing with the interface to calling `main()`, mostly to streamline a limited memory environment. – wallyk Jul 25 '13 at 08:27
  • +1: dereferencing a null pointer is guaranteed to give a segfault in most operating systems (. There's software that depends on this. – Joni Jul 25 '13 at 08:29
  • @Joni: Actually the standard states that the null pointer is invalid for dereferencing. – DevSolar Jul 25 '13 at 09:07
  • @pavan.mankala: Note that this explanation makes lots of assumptions on the environment, which might or might not be true on *your* system. As far as the *language* is concerned, the behaviour is "undefined", full stop, and second-guessing what *actually* happens is purely academic. – DevSolar Jul 25 '13 at 09:10
  • 11
    @DevSolar, what is purely academic is stating that what ever happens is undefined behavior, with no explanation given to actual observed behavior. I would claim that it's good engineering and good computer science to be able to explain what a machine does in any given situation, even when or especially when the language specification leaves the responsibility of some decision to the implementors of the compiler and the operating system. Our programs don't execute in a vacuum isolated from real issues. – Joni Jul 25 '13 at 09:33
  • 2
    @Joni: So you think it's a good explanation saying that pointers are 32 or 64 bit, parameters passed on stack, stacks extend downwards, that there's a pointer to environment variables alongside `argc` / `argv`, return address saved on the stack, yadda yadda, *without even so much as qualifying that statement as "...on Linux and Windows"?* Blissfully assuming that there **is** such a thing as `SIGSEGV` on the target machine, and that an illegal access won't crash the whole OS? For every single one of those statements, I know a system where that assumption won't hold. – DevSolar Jul 25 '13 at 09:44
  • 1
    @DevSolar, Many of those things can be deduced from the question. For example, the OP already mentions SIGSEGV, implying familiarity with a Unix system of some kind, and later mentions two examples: Linux and OS X. As to pointers being 32/64 bit, it doesn't seem a necessary assumption, but tends to be the case for machines where you can run Windows and OS X. As to there actually being a stack and stacks extending downwards, that's how compilers on these platforms *usually* organize memory, again a fairly safe thing to assume. But yes, the answer could be better if it qualified the assumptions. – Joni Jul 25 '13 at 10:09
  • @DevSolar: I have added qualifications to my answer. Is that an improvement? – wallyk Jul 31 '13 at 00:43
17

Illegal memory access is undefined behaviour. This means that your program might crash, but is not guaranteed to, because exact behaviour is undefined.

(A joke among developers, especially when facing coworkers that are careless about such things, is that "invoking undefined behaviour might format your hard drive, it's just not guaranteed to". ;-) )

Update: There's some hot discussion going on here. Yes, system developers should know what actually happens on a given system. But such knowledge is tied to the CPU, the operating system, the compiler etc., and generally of limited usefulness, because even if you make the code work, it would still be of very poor quality. That's why I limited my answer to the most important point, and the actual question asked ("why doesn't this crash"):

The code posted in the question does not have well-defined behaviour, but that does just mean that you can't really rely on what it does, not that it should crash.

DevSolar
  • 67,862
  • 21
  • 134
  • 209
  • 1
    No illegal memory access occurs here. Though the program carelessly accesses stack memory in bulk, there is nothing particularly wrong with this as long as there is enough stack data. Your answer is correct as far as it goes, but it does not address the code above. – wallyk Jul 25 '13 at 08:23
  • 1
    @wallyk: I don't know about C11, but the C99 standard does not mention "stack" anywhere. Depending on how your *implementation* handles stack, reading 1000 bytes will either make you head off beyond `x`, `argv` and `argc` into nothingness (illegal), or you`re trespassing from `x` into `str`, and using `memcpy()` on overlapping memory areas is *by definition* undefined behaviour. Either way, illegal code. And you, sir, no offense intended, are just the type coder meant in the "formatting hard drives" line: The type that doesn't find immediate fault with code like this because it *might* work. – DevSolar Jul 25 '13 at 09:04
  • I am pretty sure the C11 and C99 standards address language features, not implementation details. However, stacks are pretty tried and true. Mainframes I used long ago did not have them, but structure languages implemented stacks for their convenience. There is no memory overlap for `memcpy()`: the 1000 byte destination is allocated and its source is elsewhere. (Indeed, I have written disk formatting code, but it was always predictable and intentional.) – wallyk Jul 25 '13 at 09:09
  • 3
    @wallyk: Again, this code might work on machine A and machine B, and together A and B might account for 99% of all systems worldwide, but the code posted by the OP does invoke *undefined* behaviour, which is *defined* as "behaviour not defined by the language standard". Because it's not defined, a cautious developer should never rely on *what actually happens on system X*, because it *will* break some day. I respect your hands-on experience, but I strongly feel that too many aspiring coders are shown too much "under the hood" stuff, with too little "don't touch this" warnings applied. – DevSolar Jul 25 '13 at 09:20
9

If you dereference an invalid pointer, you are invoking undefined behaviour. Which means, the program can crash, it can work, it could cook some coffee, whatever.

Devolus
  • 21,661
  • 13
  • 66
  • 113
4

When you have

int main(int argc, char **argv) {
    void *x = 0; // you could also say void *x = (void *)10;
    printx(&x);
}

You are declaring x as a pointer with value 0, and that pointer lives in the stack since it's a local variable. Now, you are passing to printx the address of x, which means that with

memcpy(str, rec, 1000);

you are copying data from above the stack (or in fact from the stack itself), to the stack (because the stack pointer address decreases on each push). The source data is likely to be covered by the same page table entry as you are copying just 1000 bytes, so you get no segmentation fault. However, ultimately, as already written, we are talking about undefined behavior.

João Fernandes
  • 1,101
  • 5
  • 11
2

It would be crashed with great probability if you write to unacceed area. But you are reading, it can be ok. But the behaviour will be still undefined.

Alexander Mihailov
  • 1,154
  • 7
  • 15