Is it possible to check if dereferencing arbitrary memory will crash the program apriori in C?

Question

I want to write an interactive interpreted C shell that allows you to address arbitrary memory and perform commands on these memory addresses.

e.g(running program shell):

prompt> 10 bytes starting 0x400000

This instruction would try to access address 0x400000 and show 10 bytes starting there. e.g. range [0x400000, 0x400009].

and would produce output like:

{0x00, 0x01, 0x02, 0x03, 0x04, <bad>, <bad>, 0x07, 0x08, 0x09, 0x0a}

Where "bad" would indicate an attempt to address "illegal" memory.

I want to know if there is a standard way in C to check if the program is allowed to access the memory I am attempting to access, or if accessing this memory will cause the program to crash(before it actually crashes), and report information to the user that the program is not allowed to access that memory.

I ask this because most questions on this topic tend to be answered by "you can't definitely check if a pointer is valid", but I am sure that there must be some way to check if the pointer is at least "definitely invalid and will crash" or "possibly invalid, but won't crash", and unfortunately I can't find the answer to this question.

Thanks ahead of time.

How are you going to see if you can access the memory unless you've already accessed the memory? And no, you can't check that a pointer is valid; you can check that it's not NULL, but you can't tell that the location it's pointing to is valid memory. It might appear to be a valid address and instead be just a random value. I don't think it's *most questions* that are answered "No, you can't do that", but *all questions*. If you've found one that doesn't say that, then you've already got an answer to your question. — Ken White, Jul 13 '16 at 21:56
I don't care if it is valid or invalid, i care if it will crash the program or not. I don't mind corrupting memory or accessing memory that I haven't assigned yet, all I care about is if it can be accessed or not. — Dmytro, Jul 13 '16 at 21:58
what even. this is a question for your hardware architecture and OS, not C. the language can only compile to machine code that the former entities will either allow or reject at runtime. questions such as whether arbitrary numbered addresses are allowed are precisely none of C's business. besides, the mere act of trying to access a numbered address is not defined behaviour, more like implementation-defined _at best_. for most platforms, pointers are meant to point to objects, not specific addresses, and C regards the latter as just a necessary piece of happenstance. — underscore_d, Jul 13 '16 at 21:58
I addressed that in my first sentence. You can do it like a debugger does, but how to do so would be far too broad a question here. There are books on writing debuggers, and there are already many debuggers available (including many free ones). — Ken White, Jul 13 '16 at 21:59
So the only way to check if memory is legal or not is via system calls(platform dependant) if such calls exist in the first place? I really don't want to believe this is the case since the program is given the information what addresses it is mapped to and what is out of its' virtual address space, I don't understand why this information wouldn't be available to C. — Dmytro, Jul 13 '16 at 22:00
...and the implementations that _do_ specifically define attempts to access arbitrary locations are typically embedded systems with their own compilers, not general-purpose x86 or similar as it sounds like you're targeting here, and where specific addresses are actually meaningful and - again - defined in the hardware/OS manual as being allowed for access for particular purposes. — underscore_d, Jul 13 '16 at 22:02
@Dmitry Why _would_ that info be available to C? All it is usually responsible for doing is generating instructions to manipulate data objects allocated within a relative stack frame or at arbitrary-but-inconsequential (read: numerical value effectively unobservable) addresses from the freestore. It can only deal with the addresses it's given; it can't deduce possible validity of others. Again, _unless_ you have relevant OS calls to do that for you and/or are using a very specific embedded implementation. — underscore_d, Jul 13 '16 at 22:05
@underscore_d i was curious if it is possible to deduce as many of the addresses it is given to limit operations outside of them. I know that there are segment start addresses that in many cases are certain to be legal to address, but this information cannot be extracted in a standard way. — Dmytro, Jul 13 '16 at 22:09
@Dmitry Any such information would be a trait of the target platform/OS/executable format, not the language. The language is at a layer of abstraction above these things. (buffered by the compiler) — underscore_d, Jul 13 '16 at 22:13
@underscore_d so the way the system reacts to illegal memory addressing is system specific and may or may not be caught, so any such program must be platform-aware to behave correctly/at all? — Dmytro, Jul 13 '16 at 22:14
@Dmitry Absolutely. And that's only _iff_ your program can recover from being caught-in-the-act asking questions it probably shouldn't be asking. It's not that people don't do this - but those who do are usually developing things like OSes or debuggers, rather than general-purpose & crucially portable code, and have built complex implementations around the assumptions of the target system. Perhaps you'd find it educational to take a look at some such projects and see how they deal with questions like this, be it interacting with specific OS/hardware calls or actually implementing those things. — underscore_d, Jul 13 '16 at 22:17
@underscore_d I think you are being perhaps less than helpful here. Dimitry wants to write a "C shell", which we would call a debugger. This makes him one of those people who absolutely, legitimately should care about these things. — jforberg, Jul 13 '16 at 22:21
@jforberg I didn't say he wasn't, nor did I say it wasn't (in appropriate cases) legitimate... though I grant that my focus has shifted more as his intent has become clearer. But through it all, I said it's not the province of average or (easily) portable C. More recently, see the bit where I said it can be done but will probably need more platform-specific study, and suggested he review some similar projects to get some ideas. Is there something "less than helpful" about that? You're welcome to write a better answer or comments & show me how it's done! But it's still not _really_ a C question — underscore_d, Jul 13 '16 at 22:23
...insofar as, yeah, it'll be used to debug C code, but much of the heavy lifting required to do so isn't specific or really known to C, and reconciling the two will almost certainly require some equally heavyweight code. — underscore_d, Jul 13 '16 at 22:29
Perhaps it is possible by spawning a process, having it check the memory and check the result/whether it reported back or exploded. Sure there is process spawning overhead but the program spends most of its' time waiting for input anyway. Then again it won't be the same memory space... awkward... but it's still going to have similar restrictions/virtual mappings(the illegal addresses tend to be overlapping). And if the process did survive, you only try accessing it then. — Dmytro, Jul 13 '16 at 22:29
@underscore_d OK, no harm done. I see "C" as a broader topic that definitely could include things such as virtual memory mappings. But you are correct in your point that this is not possible using only standard C, and since he didn't tag his question with an OS that should perhaps be the topic. — jforberg, Jul 13 '16 at 22:33
@jforberg Yeah, I'm sure the details of C implementations and how they interact with their target platforms are just as interesting, and I'm looking forward to doing some embedded stuff with C/C++ in the future. But without some specific info/tags, I don't know if there's much we can say at the moment. — underscore_d, Jul 13 '16 at 22:42

jforberg · Accepted Answer · 2016-07-13T23:00:52.497

I don't think there is any way to do this using just standard C.

However you can use evil platform specific tricks to get an ides of how your memory mappings look. On Linux, the file /proc/(pid)/maps will list the memory maps of process pid, including read/write permission status. This is how it looks for a simple cat process on my machine:

00400000-0040c000 r-xp 00000000 00:13 1237228                            /usr/bin/cat
0060b000-0060c000 r--p 0000b000 00:13 1237228                            /usr/bin/cat
0060c000-0060d000 rw-p 0000c000 00:13 1237228                            /usr/bin/cat
01864000-01885000 rw-p 00000000 00:00 0                                  [heap]
7fe7a5e0b000-7fe7a6121000 r--p 00000000 00:13 1487092                    /usr/lib/locale/locale-archive
7fe7a6123000-7fe7a62ba000 r-xp 00000000 00:13 1486770                    /usr/lib/libc-2.23.so
7fe7a62ba000-7fe7a64ba000 ---p 00197000 00:13 1486770                    /usr/lib/libc-2.23.so
7fe7a64ba000-7fe7a64be000 r--p 00197000 00:13 1486770                    /usr/lib/libc-2.23.so
7fe7a64be000-7fe7a64c0000 rw-p 0019b000 00:13 1486770                    /usr/lib/libc-2.23.so
7fe7a64c0000-7fe7a64c4000 rw-p 00000000 00:00 0 
7fe7a64cb000-7fe7a64ee000 r-xp 00000000 00:13 1486769                    /usr/lib/ld-2.23.so
7fe7a66cc000-7fe7a66ee000 rw-p 00000000 00:00 0 
7fe7a66ee000-7fe7a66ef000 r--p 00023000 00:13 1486769                    /usr/lib/ld-2.23.so
7fe7a66ef000-7fe7a66f0000 rw-p 00024000 00:13 1486769                    /usr/lib/ld-2.23.so
7fe7a66f0000-7fe7a66f1000 rw-p 00000000 00:00 0 
7fe7a66f5000-7fe7a66f8000 rw-p 00000000 00:00 0 
7ffe398e8000-7ffe39909000 rw-p 00000000 00:00 0                          [stack]
7ffe3999b000-7ffe3999e000 r--p 00000000 00:00 0                          [vvar]
7ffe3999e000-7ffe399a0000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

So from this you can see that the program image itself is mapped near the beginning of virtual memory, the heap is slightly higher up, the stack is mapped to 7ffe398e8000-7ffe39909000 and the C library and dynamic linker are also loaded into memory.

Note that each file is mapped several times. For instance, /usr/bin/cat has both a read-only, executable and read-write segment. This is to prevent processes from writing to const memory and from executing data.

From the mapping table you could get a fair idea of how your memory is laid out and what operations would be possible on these parts of memory.

Is this a good idea? NO.

Most likely not unless you are writing a debugger or similar development tool.

As an aside, the "shell" you are thinking about writing sounds very much like a debugger. Debuggers such as gdb can do the things you talk about, including evaluating C expressions and examining memory.

As a second aside, and because I find this very interesting, here is a small exercise:

As you can see there is some kernel memory mapped at ffffffffff600000. If this theory is correct, we should be able to read that memory even though in general we can't access the kernel's memory. Let's try:

int main(void)
{
  unsigned long *p = 0xffffffffff600000;

  for (;;)
    printf("0x%lx, ", *p++);
}

We get

0xf00000060c0c748, 0xccccccccccccc305, 0xcccccccccccccccc, ... Segmentation fault

If you wonder why this memory is readable to a user space process, it is to accelerate certain syscalls such as gettimeofday and allow them to work without having to switch to kernel mode as other syscalls have to. See e.g. this question.

It is essentially a "debugger-like" application. Except instead of the intent of debugging, the intent is to expose C functionality into runtime interactive shell and "play with it" without a particular objective. Eg, in this case, I don't care if I corrupt memory, but I care if my program crashes and doesn't let me do anything about it and gives no information about what part of the code caused the crash(just a boring generic error). Essentially juggling chainsaws. — Dmytro, Jul 13 '16 at 22:18
@Dmitry that sounds interesting. But you can't really write a cross-platform debugger in any real sense, so you will get better answers if you included your chosen OS as a tag in the question. If you just tag it as "C" here on this site, people will assume pure cross-platform and you tend to get answers where people quote the C standard and similar, which is of course proper and correct but not very helpful to you in this case. — jforberg, Jul 13 '16 at 22:24
I think this is the best solution I can get for the time being. Essentially, there is no way to do this in C alone, some platform call or wrapper library is required to deal with it, which is possible but only as long as the program is platform aware. Platforms are not required to expose ways to handle such situations to C, so C has no way of universally checking if an address will crash or not. — Dmytro, Jul 13 '16 at 22:34
@Dmitry I think so. And you can _totally_ make it effectively cross-platform via conditional compilation, but that's different from being portable in the sense of the (any) language itself. A wrapper library is a great idea, certainly beats endless `#ifdef`s! (though it might require at least one!) Then you can have the other layer being portable C code that's effectively agnostic of the underlying machinery that tells it about addresses and whatnot. — underscore_d, Jul 13 '16 at 22:36

R Sahu · Answer 2 · 2016-07-13T22:15:10.437

2

I want to know if there is a standard way in C to check if the program is allowed to access the memory I am attempting to access, or if accessing this memory will cause the program to crash(before it actually crashes), and report information to the user that the program is not allowed to access that memory.

No, there isn't.

The standard only says that you can't dereference a null pointer. Beyond that, the range of pointer values that are valid is dependent on the platform. What you are hoping to pull off cannot be pulled off in platform independent code.

From footnote 87 in the C99 Standard:

Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime

edited Jul 13 '16 at 22:15

answered Jul 13 '16 at 22:04

R Sahu

204,454
14
159
270

Detail: ... whose value compares equal to `NULL` – chux - Reinstate Monica Jul 13 '16 at 22:09
FWIW, and that is my strong conviction: if code must check to see if a pointer is valid, it should be redesigned. It should **never** be necessary to perform such checks. How this can be avoided can not be easily told in a comment, however. – Rudy Velthuis Jul 13 '16 at 22:18
@RudyVelthuis, I agree. The only check that you should see is whether a pointer compares equal to `NULL`. – R Sahu Jul 13 '16 at 22:20
Actually, I generally try to avoid writing code that needs checks for NULL as well, but I know that not everyone agrees with that. – Rudy Velthuis Jul 13 '16 at 22:33

score 2 · Answer 3 · edited May 23 '17 at 11:44

Standard way? No. What you experience as crash is a result of undefined behavior. The standard doesn't delve into such details.

The Windows API provides a IsBadReadPtr function that seems to be what you are after. The documentation is quite clear on that you shouldn't use it.

The thing you overlooked is that some invalid accesses, you just can't recover from. If you touch a guard page, and catch the error without giving the guard page access handler a chance to run, you missed your chance. Next time you access the same address, you get an access violation and a core dump. Although on normal execution, this would have been fine. See Raymond Chen's IsBadXxxPtr should really be called CrashProgramRandomly.

On Unix, you can have the kernel do the dirty work for you by passing the pointer to write(2). If it returns EFAULT, it means you would have crashed your process.

Note, that while it looks like you check a priori, you're really checking the aftermath. Checking in advance isn't reliable (The mapping might change, between the check and the actual access).

If you want to get notified after the failure, write a signal handler for SIGSEGV On UNIX. On Windows, deal with the EXCEPTION_ACCESS_VIOLATION SEH exception.

Addendum: What you want to do sounds a bit like what mmbbq did. It injected a lua interpreter into an external application and allowed calling and dereferencing addresses. If you fudge it up, only the thread that was started anew was affected and the program itself continued working (for a while at least..). The website isn't online anymore, but maybe you are successful in finding a mirror.

So one way would be to just do it and catch the exception from a system that supports them, otherwise report an error and recover(unless the system doesn't like you anyway and kills your process)? — Dmytro, Jul 13 '16 at 22:10
There is no fail-proof way. But for most cases, ye, you can have an access violation handler and postpone the crash. — a3f, Jul 13 '16 at 22:15
@Dmitry If you're interested in Windows specifically, check out the mmbbq project (I edited the answer). — a3f, Jul 13 '16 at 22:30
seems like a cool project, a bit old(some links don't work). Thanks for the link. — Dmytro, Jul 13 '16 at 22:57

score 1 · Answer 4 · answered Jul 13 '16 at 23:58

1

As already said, we're far off from "standard C" here.

Nonetheless, you can (somewhat) accomplish that by handling segmentation faults. Of course there's a library for that: GNU libsigsegv

answered Jul 13 '16 at 23:58

Daniel Jour

15,896
2
36
63

Is it possible to check if dereferencing arbitrary memory will crash the program apriori in C?

4 Answers4