Can we read and fault-inject another thread's program counter?

Question

Assume that we have a single thread program and we hope to capture the value of program counter (PC) when a predefined interrupt occurs (like a timer interrupt).

It seems easy as you know we just write a specific assembly code using a special keyword __asm__ and pop the value on the top of the stack after making a shift 4 byte.

What about Multithreaded programs ?

How can we get values of all threads from another thread which run in the same process? (It seems extremely incredible to get values from thread which run on a separate core in multi-core processors). (in multithreaded programs, every thread has its stack and registers too).

I want to implement a saboteur thread.

in order to perform fault injection in the target multi-threaded program, the model of fault is SEU (single error upset) which means that an arbitrary bit in the program counter register modified randomly (bit-flip) causing to violate the right program sequence. therefore, control flow error (CFE) occurs.

Since our target program is a multi-threaded program, we have to perform fault injection on all threads' PC. This is the task of saboteur tread. It should be able to obtain threads' PC to perform fault injection. assume we have this code,

main ()
{
foo
}

void foo()
{
__asm__{
pop "%eax"
pop "%ebx" // now ebx holds porgram counter value (for main thread)
// her code injection like  00000111 XOR ebx for example
push ...
push ...
};
}

If our program was a multithreaded program. is it means that we have more than one stack?

when OS perform context switching, it means that the stack and registers of the thread that was running moved to some place in the memory. Does this mean that if we want to get the values of the program counter for those threads, we find them in memory? where? and is it possible during run-time?

Interrupts are handled by your OS. Your approach will not work in usermode. — EOF, Jun 24 '16 at 17:09
Why would this be useful, even if you could do it? What problem are you trying to solve? — Cody Gray - on strike, Jun 24 '16 at 17:10
as an aside, a thread's state or execution context is typically stored in a platform-specific data structure — obataku, Jun 24 '16 at 17:11
@oldrinb ...which *might* be accessible from the other threads, thus enabling to get the current PC of that thread. — Eugene Sh., Jun 24 '16 at 17:17
Yes, you can use OS debugging services to do this. That's what debuggers use, after all :) — Jester, Jun 24 '16 at 18:37
@EugeneSh. that is what I'm trying to perform. getting threads' PC values from another thread. say watch-dog thread, or something like this. — husin alhaj ahmade, Jun 24 '16 at 19:46
@Jester. I apologized, I did not explain my comment enough. I meant that (according to what I know) debugging the target program needs to insert breakpoints inside the source code, then going to memory and check all interested registers in order to obtain values of the program counters. after that, our program continues execution. inserting the breakpoint inside source code will cancel the randomness that I need it by using "timer interrupt". — husin alhaj ahmade, Jun 24 '16 at 22:56
You don't need to insert breakpoint. You can stop and examine other threads at will by sending a signal. — Jester, Jun 24 '16 at 23:38

score 5 · Accepted Answer · edited May 23 '17 at 11:58

When you install a signal handler using sigaction() with SA_SIGINFO in the flags, the second parameter the signal handler gets is a pointer to siginfo_t, and the third parameter is a pointer to an ucontext_t. In Linux, this structure contains, among other things, the set of register values when the kernel interrupted the thread, including program counter.

#define _POSIX_C_SOURCE 200809L
#define _GNU_SOURCE
#include <signal.h>
#include <ucontext.h>

#if defined(__x86_64__)
#define  PROGCOUNTER(ctx) (((ucontext *)ctx)->uc_mcontext.greg[REG_RIP])
#elif defined(__i386__)
#define  PROGCOUNTER(ctx) (((ucontext *)ctx)->uc_mcontext.greg[REG_EIP])
#else
#error Unsupported architecture.
#endif

void signal_handler(int signum, siginfo_t *info, void *context)
{
    const size_t program_counter = PROGCOUNTER(context);

    /* Do something ... */

}

As usual, printf() et al. are not async-signal safe, which means it is not safe to use them in a signal handler. If you wish to output the program counter to e.g. standard error, you should not use any of the standard I/O to print to stderr, and instead construct the string to be printed by hand, and use a loop to write() the contents of the string; for example,

#include <stdlib.h>
#include <unistd.h>
#include <errno.h>

static void wrerr(const char *p)
{
    const int   saved_errno = errno;
    const char *q = p;
    ssize_t     n;

    /* Nothing to print? */
    if (!p || !*p)
        return;

    /* Find end of q. strlen() is not async-signal safe. */
    while (*q) q++;

    /* Write data from p to q. */
    while (p < q) {
        n = write(STDERR_FILENO, p, (size_t)(q - p));
        if (n > 0)
            p += n;
        else
        if (n != -1 || errno != EINTR)
            break;
    }

    errno = saved_errno;
}

Note that you'll want to keep the value of errno unchanged in the signal handler, so that if interrupted after a failed library function, the interrupted thread still sees the correct errno value. (It's mostly a debugging issue, and "good form"; some idiots pooh-pooh this as "it does not happen often enough for me to worry about".)

Your program can examine the /proc/self/maps pseudofile (it is not a real file, but something that the kernel generates on the fly when the file is read) to see the memory regions used by the program, to determine whether the program was running a C library function (very common) or something else when the interrupt was delivered.

If you wish to interrupt a specific thread in a multi-threaded program, just use pthread_kill(). Otherwise the signal is delivered to one of the threads that has not blocked the signal, more or less at random.

Here is an example program, that is tested to in x86-64 (AMD64) and x86, when compiled with GCC-4.8.4 using -Wall -O2:

#define  _POSIX_C_SOURCE 200809L
#define  _GNU_SOURCE
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <signal.h>
#include <ucontext.h>
#include <time.h>
#include <stdio.h>

#if defined(__x86_64__)
#define PROGRAM_COUNTER(mctx)   ((mctx).gregs[REG_RIP])
#define STACK_POINTER(mctx)     ((mctx).gregs[REG_RSP])
#elif defined(__i386__)
#define PROGRAM_COUNTER(mctx)   ((mctx).gregs[REG_EIP])
#define STACK_POINTER(mctx)     ((mctx).gregs[REG_ESP])
#else
#error Unsupported hardware architecture.
#endif

#define MAX_SIGNALS  64
#define MCTX(ctx)    (((ucontext_t *)ctx)->uc_mcontext)

static void wrerr(const char *p, const char *q)
{
    while (p < q) {
        ssize_t n = write(STDERR_FILENO, p, (size_t)(q - p));
        if (n > 0)
            p += n;
        else
        if (n != -1 || errno != EINTR)
            break;
    }
}

static const char hexc[16] = "0123456789abcdef";

static inline char *prehex(char *before, size_t value)
{
    do {
        *(--before) = hexc[value & 15];
        value /= (size_t)16;
    } while (value);
    *(--before) = 'x';
    *(--before) = '0';
    return before;
}

static volatile sig_atomic_t done = 0;

static void handle_done(int signum)
{
    done = signum;
}

static int install_done(const int signum)
{
    struct sigaction act;

    memset(&act, 0, sizeof act);
    sigemptyset(&act.sa_mask);
    act.sa_handler = handle_done;
    act.sa_flags = 0;
    if (sigaction(signum, &act, NULL) == -1)
        return errno;

    return 0;
}

static size_t jump_target[MAX_SIGNALS] = { 0 };
static size_t jump_stack[MAX_SIGNALS] = { 0 };

static void handle_jump(int signum, siginfo_t *info, void *context)
{
    const int   saved_errno = errno;
    char        buffer[128];
    char       *p = buffer + sizeof buffer;

    *(--p) = '\n';
    p = prehex(p, STACK_POINTER(MCTX(context)));
    *(--p) = ' ';
    *(--p) = 'k';
    *(--p) = 'c';
    *(--p) = 'a';
    *(--p) = 't';
    *(--p) = 's';
    *(--p) = ' ';
    *(--p) = ',';
    p = prehex(p, PROGRAM_COUNTER(MCTX(context)));
    *(--p) = ' ';
    *(--p) = '@';
    wrerr(p, buffer + sizeof buffer);

    if (signum >= 0 && signum < MAX_SIGNALS) {
        if (jump_target[signum])
            PROGRAM_COUNTER(MCTX(context)) = jump_target[signum];
        if (jump_stack[signum])
            STACK_POINTER(MCTX(context)) = jump_stack[signum];
    }

    errno = saved_errno;
}

static int install_jump(const int signum, void *target, size_t stack)
{
    struct sigaction act;

    if (signum < 0 || signum >= MAX_SIGNALS)
        return errno = EINVAL;

    jump_target[signum] = (size_t)target;
    jump_stack[signum] = (size_t)stack;

    memset(&act, 0, sizeof act);
    sigemptyset(&act.sa_mask);
    act.sa_sigaction = handle_jump;
    act.sa_flags = SA_SIGINFO;
    if (sigaction(signum, &act, NULL) == -1)
        return errno;

    return 0;
}

int main(int argc, char *argv[])
{
    const struct timespec sec = { .tv_sec = 1, .tv_nsec = 0L };
    const int pid = (int)getpid();
    ucontext_t ctx;

    printf("Run\n");
    printf("\tkill -KILL %d\n", pid);
    printf("\tkill -TERM %d\n", pid);
    printf("\tkill -HUP  %d\n", pid);
    printf("\tkill -INT  %d\n", pid);
    printf("or press Ctrl+C to stop this process, or\n");
    printf("\tkill -USR1 %d\n", pid);
    printf("\tkill -USR2 %d\n", pid);
    printf("to send the respective signal to this process.\n");
    fflush(stdout);

    if (install_done(SIGTERM) ||
        install_done(SIGHUP)  ||
        install_done(SIGINT) ) {
        printf("Cannot install signal handlers: %s.\n", strerror(errno));
        return EXIT_FAILURE;
    }

    getcontext(&ctx);

    if (install_jump(SIGUSR1, &&usr1_target, STACK_POINTER(MCTX(&ctx))) ||
        install_jump(SIGUSR2, &&usr2_target, STACK_POINTER(MCTX(&ctx))) ) {
        printf("Cannot install signal handlers: %s.\n", strerror(errno));
        return EXIT_FAILURE;
    }

    /* These are expressions that should evaluate to false, but the compiler
     * should not be able to optimize them away. */
    if (argv[0][1] == 'A') {
usr1_target:
        fputs("USR1\n", stdout);
        fflush(stdout);
    }

    if (argv[0][1] == 'B') {
usr2_target:
        fputs("USR2\n", stdout);
        fflush(stdout);
    }

    while (!done) {
        putchar('.');
        fflush(stdout);
        nanosleep(&sec, NULL);
    }

    fputs("\nAll done.\n", stdout);
    fflush(stdout);

    return EXIT_SUCCESS;
}

If you save the above as example.c, you can compile it using

gcc -Wall -O2 example.c -o example

and run it

./example

Press Ctrl+C to exit the program. Copy the commands (for sending SIGUSR1 and SIGUSR2 signals), and run them from another window, and you'll see they modify the position for current execution. (The signals cause the program counter/instruction pointer to jump back, into an if clause that should never be executed otherwise.)

There are two sets of signal handlers. handle_done() just sets the done flag. handle_jump() outputs a message to standard error (using low-level I/O), and if specified, updates the program counter (instruction pointer) and stack pointer.

The stack pointer is the tricky part when creating an example program like this. It would be easy if we were satisfied with just crashing the program. However, an example is only useful if it works.

When we arbitrarily change the program counter/instruction pointer, and the interrupt was delivered when in a function call (most C library functions...), the return address is left on the stack. The kernel can deliver the interrupt at any point, so we cannot even assume that the interrupt was delivered when in a function call, either! So, to make sure the test program does not crash, I had to update the program counter/instruction pointer and stack pointer as a pair.

When a jump signal is received, the stack pointer is reset to a value I obtained using getcontext(). This is not guaranteed to be suitable for any jump location; it's just the best I could do for a minimal example. I definitely assume the jump labels are nearby, and not in subscopes where the compiler is likely to mess with the stack, mind you.

It is also important to keep in mind that because we are dealing with details left to the C compiler, we must conform to whatever binary code the compiler produces, not the other way around. For reliable manipulation of a process and its threads, ptrace() is a much better (and honestly, easier) interface. You just set up a parent process, and in the target traced child process, explicitly allow the tracing. I've shown examples here and here (both answers to the same question) on how to start, stop, and single-step individual threads in a target process. The hardest part is understanding the overall scheme, the concepts; the code itself is easier -- and much, much more robust than this signal-handler-context-manipulation way.

For self-introducing register errors (either to program counter/instruction pointer, or to any other register), with the assumption that most of the time that leads to the process crashing, this signal handler context manipulation should be sufficient.

Can you modify the `uc_mcontext`, so the signal-handler returns to a different place? If not, this doesn't really solve the OP's problem of fault-injection into another thread. Neat trick, though; I didn't know the thread context was exposed like that to a signal handler. — Peter Cordes, Jun 25 '16 at 08:35
@PeterCordes: Yes, you can. The register state will reflect `uc_mcontext` after the signal handler returns. In particular, `uc_mcontext.gregs[REG_RIP]` on x86-64 will change the location the code continues execution at. Note that GCC provides the useful operator [`&&`](https://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html) as an extension; I verified this works by assigning the address of a label (`&&labelname`) to a static variable, then modified the `RIP`/`EIP` register to reflect that, and indeed, the code does continue execution at the label instead. — Nominal Animal, Jun 25 '16 at 18:08
Cool! That means the OP can just add a signal handler for SIGUSR1 or something that munges the low bits of `uc_mcontext.gregs[REG_RIP]`, and have threads fault-inject themselves in response to signals. That's probably easier than `ptrace` (esp. since you've already provided code :) — Peter Cordes, Jun 25 '16 at 19:32
@PeterCordes: Exactly. I personally define a macro, `#define PROGRAM_COUNTER(ctx) (((ucontext_t *)ctx)->uc_mcontext.gregs[REG_RIP])` on x86-64 and `#define PROGRAM_COUNTER(ctx) (((ucontext_t *)ctx)->uc_mcontext.gregs[REG_EIP])` on x86 (don't have other arches to test, and am too lazy to go look at kernel sources to find out) to access the program counter/instruction pointer, using the `void *context` parameter the signal handler has access to. I shall add some proof-of-concept code to my answer above. (The test code I used was too ugly to live in public; I'd die of shame. :) — Nominal Animal, Jun 25 '16 at 22:36
@NominalAnimal, PeterCordes This is fantastic, thanks a lot! I'm trying to perform and implement suggested solutions. thanks a lot for your cooperation with me. — husin alhaj ahmade, Jun 26 '16 at 11:37

score 2 · Answer 2 · edited May 23 '17 at 12:16

No, it's not possible while a thread is executing. While a thread is executing, the current value of its program counter (EIP) is private to the CPU core it's running on. It's not available in memory anywhere.

It would be possible for an architecture to have special instructions to send inter-processor requests with queries about execution state, but x86 doesn't have this.

However, you can use ptrace system calls to do anything a debugger could; interrupt another thread and modify any of its state (general purpose registers, flags, program counter, etc. etc.) I can't give you an example, I just know that's the system call that debuggers use to modify the saved state of another thread / process. For example, this question asks about modifying another process's RIP using ptrace (for testing code-injection).

I'm not sure it's viable to ptrace one thread from another thread in the same process; your fault injector might work better as a separate process that interferes with the threads of another process.

Anyway, what will happen when you make a ptrace system call to modify something in another thread is that the CPU running your system call will send and inter-processor message to the kernel on the CPU running the other thread, which will interrupt that thread you want to mess with. Its state will be saved into memory by the kernel, where it can be modified by any CPU.

Once the other thread stops running, it isn't strongly associated with any CPU anymore. It will be cheaper to resume it on the CPU that already has hot caches for it, but that isn't guaranteed because that CPU could have started running any other thread once it was no longer busy running the thread you caused to be stopped.

Side note, not relevant to inter-thread fault injection:

Your C function for modifying EIP (foo()) is really ugly, BTW:

First of all, it's MSVC inline asm, so no Linux compiler will accept it (maybe icc?). Second, it only works with -fno-omit-frame-pointer, because it assumes that its inside a function that's pushed %ebp.

It would be so much easier to just write the whole function in asm. In 64bit non-inline asm, you'd just write:

global  fault_inject_program_counter
fault_inject_program_counter:
    xor   qword [rsp],  0b00000111
    ret

and assemble that file separately with NASM or YASM, and link the .o with code that calls it. (I'm assuming you'd prefer Intel syntax, since you used MSVC-style asm {} instead GNU C asm("pop ; ... ; "::: ); inline asm.)

an inline asm version might look like:

// this can't possibly work if inlined, or if compiled without `-fno-omit-frame-pointer
__attribute__((noinline)) void foo()
{
    __asm__ volatile(
    // "pop %eax\n\t"
    // "pop %ebx\n\t"    // now ebx holds the return address
    // here code injection like  00000111 XOR ebx for example

    // normal people would just write
       "xorl  $0b00000111,  -4(%esp)\n\t"
    // to modify the return value in-place, in a function with a frame pointer.

    // push ...
    // push ...
    );
}

Can we read and fault-inject another thread's program counter?

2 Answers2

Side note, not relevant to inter-thread fault injection: