context switch between user space thread and std::thread in a multi threaded environment

Question

I am trying to implement the following model in C++ between user-space thread and std::thread.

enter image description here

The kernel thread in the diagram will correspond to std::thread. I have implemented a stack switch mechanism to implement the user-space thread. The std::threads wait on a thread-safe queue of some context struct -- where the context struct represents the context of a user-space thread. It contains the stack pointer and other important register values.

I implemented it but I feel some of the implementation details can be improved with respect to Context Switching from a user-space thread to std::thread.

Here is the very minimal code to represent the basic utility functions.

struct threadContext {
    // registers set
    // pointer to user space thread context if this is not a user space thread
};


// shared data structure
std::unordered_map<std::thread::id,int> Map;
threadContext kThread[4];



// this run function is part of a thread pool class

void run(unsigned i) {
    // wait on a thread safe queue
    // pop a user thread context pointer p
    yield_to(p) // this will switch stack & instruction pointer to a different 
                // location
    // push p to queue if p is not done
    // loop

}

void userFunc() {
    std::cout <<"Hello from user Func" << std::endl;

    yield(); // this will switch stack pointer to the original locaiton
             // which is inside the run function
   
}

Inside the yield() function I need to load the parent std::thread's context and save the current user thread's context. But we have more than 1 std::thread so I have to use a mutex and unordered_map (indexed with std::thread::get_id()) to get to the current running std::thread from kthread[4] array.

So basically, when a user-space thread wants to give control back to its parent std::thread, it needs to know which thread context to load to jump safely at parentstd::thread. For this, I had to use a lock-based data structure.

Is there any way to avoid the mutex for getting control back at the std::thread Or have a completely different method to give control back to the parent std::thread?

Sorry, I intentionally excluded some implementation detail. Please suggest if it is required. (For ContextSwitch implementation you might want to check my previous question asked a few weeks back on different issues related to user-space thread implementation
Pass arguments to a user space thread )

Thanks!

I don't get why you need to lock it. I'm not familiar with assembly, but isn't it pretty like save-state-then-return? — apple apple, Feb 10 '21 at 17:16
Yes, @appleapple.. we save the register context of the current thread and then load the next thread context & invoke a `ret` instruction to point the instruction pointer to the next thread instruction area. In my case, when a user thread wants to yield to `std::thread`, it also needs to know which `std::thread`'s context to load. But we have more than 1 `std::thread` and so I used a shared data structure. — Debashish, Feb 10 '21 at 17:25
why return to parent context need to know the thread id? it's the parent and when you return it's the only thread(/context) there, isn't it? — apple apple, Feb 10 '21 at 18:16
How to load parent thread context when I am at user-space thread context? — Debashish, Feb 10 '21 at 18:37
If we have only one main thread and multiple user-space threads, then we would have a curr_Context pointer to point to the current running context and then do some scheduling to load the next context. But here multiple `curr_context` pointers exist because we have multiple actual `std::thread` for running multiple `user_space` thread instead of a single `main thread`. — Debashish, Feb 10 '21 at 18:44
@Debashish, what if you make your `curr_Context` a `thread_local` variable? — Solomon Slow, Feb 10 '21 at 23:55
@SolomonSlow, yeah, I did not know about the usage of `thread_local`. I will try for sure. Thanks. — Debashish, Feb 11 '21 at 04:39

score 1 · Answer 1 · answered Feb 11 '21 at 02:37

Setjmp and longjmp are your friends here; but maybe your enemies. They provide the fundamental mechanism, yet to effectively invoke them for your use case, you need to be able to dig out the stack pointer, handle compiler weirdness like stack bounds checking, etc... short answer to that is that to do what you want, you might as well have your own setjmp, longjmp in assembly so you know the PC,SP values you need to manipulate:

#include "cpu.h"

.globl ProcStart
.globl ProcSave
.globl ProcRestore
.globl CpuCSwap

#ifdef __amd64__
ProcSave:
        mov (%rsp), %rax
        mov %rax, _IP*8(%rdi)
        mov %rsp, _SP*8(%rdi)
        mov %ebx, _BX*8(%rdi)
        mov %rbp, _BP*8(%rdi)
        mov %r12, _R12*8(%rdi)
        mov %r13, _R13*8(%rdi)
        mov %r14, _R14*8(%rdi)
        mov %r15, _R15*8(%rdi)
        mov $0, %rax
        ret
ProcRestore:
        mov _SP*8(%rdi), %rsp
        pushq _IP*8(%rdi)
        mov _BX*8(%rdi), %rbx
        mov _BP*8(%rdi), %rbp
        mov _R12*8(%rdi), %r12
        mov _R13*8(%rdi), %r13
        mov _R14*8(%rdi), %r14
        mov _R15*8(%rdi), %r15
        mov %rsi, %rax
        ret

ProcStart:
        mov 8(%rsp), %rdi
        call *(%rsp)
        call ProcStop
        hlt

CpuCSwap:
        mov %rsi, %rax
        cmpxchg %rdx,(%rdi)
        cmovz %rdx, %rax
        ret
#endif

Inside cpu.h, you would define the register save layout appropriate for your needs; assigning the indices as _IP, _RDI, ...etc:

#if __amd64__

#define _IP 0
#define _SP 1
#define _BX 2
#define _BP 3
#define _R12 4
#define _R13 5
#define _R14 6
#define _R15 7

#define CPU_PC   _IP
#define CPU_SP   _SP
#define CPU_FP   _BP
#define CPU_NREG 8

#elif __i386__

#define _IP 0
#define _SP 1
#define _BX 2
#define _BP 3
#define _SI 4
#define _DI 5

#define CPU_PC   _IP
#define CPU_SP   _SP
#define CPU_FP   _BP
#define CPU_NREG  6

#else


#define CPU_PC 0
#define CPU_SP 1
#define CPU_NREG 2

#endif

So you can leave your higher level ignorant of everything but maybe stack direction; nobody has tried an increasing address stack in over 40 years. You are on safe ground. btw, these use standard calling convention; microsoft are forging their own little path.

ps: don't do it in C++. You will spend so much time fighting your development environment you might decide to take a job making pizzas. Make a simple framework in a systems language like C, then work out how to invoke it from an application framework in C++. Confusing C++ for a systems language is a great source of consternation for many, and a great source of recurring standards revenue for a few. — mevets, Feb 11 '21 at 02:43

context switch between user space thread and std::thread in a multi threaded environment

1 Answers1