2

Can I tell if a pointer is in the rodata section of an executable? As in, editing that pointer's data would cause a runtime system trap.

Example (using a C character pointer):

void foo(char const * const string) {
    if ( in_rodata( string ) ) {
        puts("It's in rodata!");
    } else {
        puts("That ain't in rodata");
    }
}

Now I was thinking that, maybe, I could simply compare the pointer to the rodata section.

Something along the lines of:

if ( string > start_of_rodata && string < end_of_rodata ) {
    // it's in rodata!
}

Is this a feasible plan/idea? Does anyone have an idea as to how I could do this?

(Is there any system information that one might need in order to answer this?) I am executing the program on a Linux platform.

  • 1
    Obviously it's going to be platform-specific so ISO C is no help, but yes you could arrange for there to be a symbol at the start/end of `.rodata`, e.g. by modifying the default linker script, which would allow your idea to work exactly like that (given C declarations like `extern const char start_of_rodata[];`) – Peter Cordes Dec 08 '20 at 03:54
  • 1
    Which platform(s) do you care about doing this on? Given that you said `.rodata`, I guess not Windows? So GNU/Linux? – Peter Cordes Dec 08 '20 at 03:57
  • Of course, I doubt that it could possibly be portable; do all systems even have rodata? I'm not against using Assembly, or modifying the binary, as long as it's not too much. –  Dec 08 '20 at 03:58
  • Some systems have ways to inquire about the memory protections of virtual memory regions, e.g. parsing `/proc/self/maps` on Linux. You can also install a handler for `SIGSEGV` which does a `longjmp`, try writing there, and see if it fires (maybe problematic if other threads may be reading or writing it). Of course, the real question is why you want to do this, and whether there's a better way to address whatever problem you were hoping it would solve. – Nate Eldredge Dec 08 '20 at 03:59
  • @PeterCordes Yes, Linux, updated the post to include that information for others! –  Dec 08 '20 at 04:00
  • No, they don't all have a section literally called `.rodata`. On Windows I think it's called `.rdata` but functions basically the same. But yes almost all modern systems (C implementations) do have read-only memory that can and does hold some static constant data. Some historical exceptions include DOS .com programs: no sections at all, and even .exe programs ran on systems with no memory protection. – Peter Cordes Dec 08 '20 at 04:04
  • @NateEldredge: Trial-write can be made thread-safe (at a high performance cost) by using an atomic-RMW to do something like `or` with `0`. But you'd need inline asm because clang will optimize `atomic_fetch_add(&val, 0)` to mfence and a load, no actual write. [Is the transformation of fetch\_add(0, memory\_order\_relaxed/release) to mfence + mov legal?](https://stackoverflow.com/q/64976395) – Peter Cordes Dec 08 '20 at 04:08
  • @NateEldredge I've thought about it, but at this point, it's more for recreation and curiosity. –  Dec 08 '20 at 04:28
  • @PeterCordes: Interesting, and `volatile` doesn't circumvent it either. One especially funny example: `atomic_fetch_or(&x, 0)` does compile into a `lock or`... of a completely different address! https://godbolt.org/z/Mhec51 And `atomic_fetch_add(&x, 0)` compiles into the same `lock or`. – Nate Eldredge Dec 08 '20 at 04:42
  • 1
    You can see some code that is available in my [SOQ](https://github.com/jleffler/soq) (Stack Overflow Questions) repository on GitHub as files `memprobe.c` and `memprobe.h` in the [src/so-1886-3184](https://github.com/jleffler/soq/tree/master/src/so-1886-3184) sub-directory. It is code based on the answers to the duplicate question. – Jonathan Leffler Dec 08 '20 at 04:57
  • @NateEldredge: Yeah, a dummy `lock or` is a more efficient full barrier than `mfence`, so that's the same LLVM optimization in effect, just not using `mfence` for the barrier. I should have said "barrier" in my earlier comment because clang doesn't use mfence in the first place because it's slower. – Peter Cordes Dec 08 '20 at 11:35
  • @JonathanLeffler: Not an exact duplicate; there are platform-specific ways to do this better than probing. (e.g. a range check from symbol addresses, or as the existing answer suggests, by making a system call like `read()` on /dev/zero and return `-EFAULT` if not valid and writeable.) – Peter Cordes Dec 08 '20 at 12:53

1 Answers1

0

I doubt that it could possibly be portable

If you don't want to mess with linker scripts or using platform-specific memory map query APIs, a proxy approach is fairly portable on platforms with memory protection, if you're willing to just know whether the location is writable, read-only, or neither. The general idea is to do a test read and a test write. If the first succeeds but the second one fails, it's likely .rodata or code segment. This doesn't tell you "it's rodata for sure" - it may be a code segment, or some other read-only page, such as as read-only file memory mapping that has copy-on-write disabled. But that depends on what you had in mind for this test - what was the ultimate purpose.

Another caveat is: For this to be even remotely safe, you must suspend all other threads in the process when you do this test, as there's a chance you may corrupt some state that code executing on another thread may happen to refer to. Doing this from inside a running process may have hard-to-debug corner cases that will stop lurking and show themselves during a customer demo. So, on platforms that support this, it's always preferable to spawn another process that will suspend the first process in its entirety (all threads), probe it, write the result to the process's address space (to some result variable), resume the process and terminate itself. On some platforms, it's not possible to modify a process's address space from outside, and instead you need to suspend the process mostly or completely, inject a probe thread, suspend the remaining other threads, let the probe do its job, write an answer to some agreed-upon variable, terminate, then resume everything else from the safety of an external process.

For simplicity's sake, the below will assume that it's all done from inside the process. Even though "fully capable" self-contained examples that work cross-process would not be very long, writing this stuff is a bit tedious especially if you want it short, elegant and at least mostly correct - I imagine a really full day's worth of work. So, instead, I'll do some rough sketches and let you fill in the blanks (ha).

Windows

Structured exceptions get thrown e.g. due to protection faults or divide by zero. To perform the test, attempt a read from the address in question. If that succeeds, you know it's at least a mapped page (otherwise it'll throw an exception you can catch). Then try writing there - if that fails, then it was read-only. The code is almost boring:

static const int foo;
static int bar;

#if _WIN32
typedef struct ThreadState ThreadState;
ThreadState *suspend_other_threads(void) { ... }
void resume_other_threads(ThreadState *) { ... }

int check_if_maybe_rodata(void *p) {
  __try {
    (void) *(volatile char *)p;
  } __finally {
    return false;
  }
  volatile LONG result = 0;
  ThreadState *state = suspend_other_threads();
  __try {
    InterlockedExchange(&result, 1);
    LONG saved = *(volatile LONG*)p;
    InterlockedExchange((volatile LONG *)p, saved);
    InterlockedExchange(&result, 0); // we succeeded writing there
  } __finally {}
  resume_other_threads(state);
  return result;
}

int main() {
  assert(check_if_maybe_rodata(&foo));
  assert(!check_if_maybe_rodata(&bar));
}
#endif

Suspending the threads requires traversing the thread list, and suspending each thread that's not the current thread. The list of all suspended threads has to be created and saved, so that later the same list can be traversed to resume all the threads.

There are surely caveats, and WoW64 threads have their own API for suspension and resumption, but it's probably something that would, in controlled circumstances, work OK.

Unix

The idea is to leverage the kernel to check the pointer for us "at arms length" so that no signal is thrown. Handling POSIX signals that result from memory protection faults requires patching the code that caused the fault, inevitably forcing you to modify the protection status of the code's memory. Not so great. Instead, pass a pointer to a syscall you know should succeed in all normal circumstances to read from the pointed-to-address - e.g. open /dev/zero, and write to that file from a buffer pointed-to by the pointer. If that fails with EFAULT, it is due to buf [being] outside your accessible address space. If you can't even read from that address, it's not .rodata for sure.

Then do the converse: from an open /dev/zero, attempt a read to the address you are testing. If the read succeeds, then it wasn't read-only data. If the read fails with EFAULT that most likely means that the area in question was read-only since reading from it succeeded, but writing to it didn't.

In all cases, it'd be most preferable to use native platform APIs to test the mapping status of the page on which the address you try to access resides, or even better - to walk the sections list of the mapped executable (ELF on Linux, PE on Windows), and see exactly what went where. It's not somehow guaranteed that on all systems with memory protection the .rodata section or its equivalent will be mapped read only, thus the executable's image as-mapped into the running process is the ultimate authority. That still does not guarantee that the section is currently mapped read-only. An mprotect or a similar call could have changed it, or parts of it, to be writable, even modified them, and then perhaps changed them back to read-only. You'd then have to either checksum the section if the executable's format provides such data, or mmap the same binary somewhere else in memory and compare the sections.

But I smell a faint smell of an XY problem: what is it that you're actually trying to do? I mean, surely you don't just want to check if an address is in .rodata out of curiosity's sake. You must have some use for that information, and it is this application that would ultimately decide whether even doing this .rodata check should be on the radar. It may be, it may be not. Based on your question alone, it's a solid "who knows?"

Kuba hasn't forgotten Monica
  • 95,931
  • 16
  • 151
  • 313
  • Why InterlockedExchange for the local private `result` variable? Surely `volatile` would be enough to prevent compile-time reordering. Also, why not [`InterlockedOr8(p, 0)`](https://learn.microsoft.com/en-us/windows/win32/api/winnt/nf-winnt-interlockedor8) instead of a load and then separate exchange? You wouldn't have to suspend other threads if you're doing an *atomic* RMW that doesn't actually modify the memory, but still doesn't optimize away at compile time. Also, shouldn't the initializer for `result` be different from the final assigned value that means "can write"? – Peter Cordes Dec 08 '20 at 11:44
  • 1
    Nice idea to use a system call returning `EFAULT`, though. A system call like `gettimeofday` is lighter weight than read, but unfortunately it's *too* light-weight: it's handled entirely in user-space so actually faults instead of detecting unwriteable memory. `stat(".", bad_ptr)` might be a good choice, not requiring a valid FD. Or `getrusage` avoids any pathname handling on the kernel side. – Peter Cordes Dec 08 '20 at 11:59
  • On Linux specifically, a signal handler can modify the normal thread's execution context (saved registers), so you *could* maybe use a [label address](https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html) and modify RIP (x86-64 program counter) to skip past a faulting `lock or byte [mem], 0` if it raises a segfault. But attempting a syscall that writes or that reads is probably still better than setting up a signal handler. – Peter Cordes Dec 08 '20 at 12:02
  • Oh, I just realized that stat and getrusage would write a largish buffer whose size you can't choose, making them break things if the memory is writeable. Oops. – Peter Cordes Dec 08 '20 at 12:52
  • I believe there are some systems where attempting `read()` with an unwritable buffer will raise SIGSEGV instead of returning `EFAULT`. I had a vague recollection of this being the case on some version of Linux, but I'm probably wrong and am thinking of something else. – Nate Eldredge Dec 08 '20 at 17:12
  • @PeterCordes Using Or8 would of course work, but the use of atomic operations here doesn't have much to do with multi-thread operation of the function itself. It has to do with all the other threads that may get upset by all this. There is a short window of opportunity where a read-writeback is done to memory of unknown provenance, that may include other threads' state, and any code can race with that. Doing a compare-store would work better while remaining thread-safe, but the spinloop would need to have a timeout. I was really tired when I wrote the answer - not a good idea, ha :) – Kuba hasn't forgotten Monica Dec 09 '20 at 13:15
  • @PeterCordes Very good ideas with syscall suggestions. Thank you! – Kuba hasn't forgotten Monica Dec 09 '20 at 13:15
  • The entire point of using *just* `Or8(&var, 0)` rather than separate volatile load / atomic `store` (via xchg or whatever) is to avoid the non-atomic read / writeback. A `lock or [mem], 0` has literally no observable effect from the PoV of other threads, other than performance, no matter what the other thread was in the middle of doing with that byte of `mem`. That's true in general for other ISAs with their own way of implementing atomic RMW. – Peter Cordes Dec 09 '20 at 13:21
  • Some compilers will optimize `atomic_fetch_or(&var, 0)` into a [barrier (+ load of old value if used)](https://stackoverflow.com/q/64976395), but I guessed that InterlockedOr8 might not be handled that way, at least not on MSVC. To work portably, you might need to hide it behind a non-inline function, and pass the `0` to be ORed as a runtime variable arg. – Peter Cordes Dec 09 '20 at 13:26