TL,DR: I need to distinguish at runtime whether some function read from the same memory address more than once.
Context
I have a library written in C that sometimes validates input before processing it. Schematically:
if (!validate(input, input_size)) return ERROR;
return process_valid_input(input, input_size);
This is only correct because the content of input
doesn't change between the two function calls. Otherwise process_valid_input
might receive invalid input, and that would be bad.
Now I am using this library in a program that wants to call the library with buffers that are in shared memory. The memory may be shared with an untrusted program that could change the data between validation and processing. So the program must copy the input from the shared memory to trusted memory. But copying is expensive (in CPU time and more importantly for us RAM), so I want to avoid it when not necessary. Sometimes the library makes its own copies: it copies a few bytes, works on them, then copies the next few bytes and works on those, and so on.
Reading directly from shared memory is safe as long as the program only ever reads each individual byte once. In other words, it's an error if the program reads from the same address twice. In some cases the library itself is safe, in others it needs the program around it to make a copy. I want to write tests that enforce that the program is safe, i.e. that it never reads from the same shared memory address twice (within a given function call).
The program and the library are portable, but I do most of my testing on Linux, and the relevant behaviors are platform-independent so I only care about doing this testing on Linux. This needs to work with a stock Linux in user mode (on our CI, I can have root, but only inside a Docker container which is running on top of a kernel that's outside my control).
Question
How can I instrument my C program so that
BEGIN_READ_ONCE(input, input_size); // for me to write
library_function(input, input_size); // I can't change this
END_READ_ONCE(input, input_size); // for me to write
will result in a runtime failure if library_function
reads from the same address twice? I can't change the library code itself, only add custom code before and after calling the library. I can compile it with instrumentation or run it inside an unusual environment as long as this is doable in Linux user mode.
Example
This is a good function. It prints a sequence of non-zero characters, then stops.
void good(volatile char *input, size_t n) {
for (size_t i = 0; i < n; i++) {
char c = *input;
if (c == 0) break;
putchar(c); // not supposed to print a null byte
}
}
This is a bad function, because it reads each character twice. In the real world, it could print a null byte (and keep going) if the input is in shared memory and the value changes between the execution of input[i] == 0
and the execution of putchar(input[i])
.
void bad(volatile char *input, size_t n) {
for (size_t i = 0; i < n && input[i] != 0; i++) {
putchar(input[i]); // not supposed to print a null byte
}
}
What I tried
Some things I thought of, but I don't know how to make them work:
- Maybe I could
mmap
the memory in some weird way? But how can I detect reads? (I'm sure this is doable with a kernel module, but I can't use a kernel module in my test environment.) - Valgrind can notice reads, but how would I tell it “one read good, two reads bad”? I can't think of a way to get any of the checkers to work for me.
- I could set a read watchpoint in gdb, but (at least on x86_64) this seems to be limited to very small sizes, which makes it impractical in my case (I need to handle inputs of about 1kB).
- Maybe some profiling tools could help, but they would need trigger on memory accesses and to do exact measurements, without sampling or aggregating, since I need to know for sure whether the same address was accessed twice.
- I could share the memory with a thread that keeps writing to the memory and hope to trigger a functional error inside the library, but that seems very unreliable.