I want to understand what does the multi-threaded code look like after compilation and how does the CPU execute it (assume the machine has one single-core CPU). Consider the following toy example:
// Computes a large Fibonacci number and prints it.
static void* Fibonacci(void* arguments) {
...
}
// Reads user input from terminal and prints it
static void* UserInput(void* arguments) {
...
}
int main() {
pthread_t prime_thread;
pthread_t input_thread;
pthread_create(&prime_thread, NULL, Fibonacci, NULL);
pthread_create(&input_thread, NULL, UserInput, NULL);
pthread_join(prime_thread, NULL);
pthread_join(input_thread, NULL);
return 0;
}
I created 2 threads, one to do CPU-intensive computation of Fibonacci numbers, another to wait for the user info. When I compile the code with gcc main.c -pthread
everything gets compiled into a single executable binary file. Thus, I assume that after starting this program, the CPU will execute instructions written in that binary one by one with possible jumps to subroutines.
I've checked the assembly code for this program and in a nutshell, it looks like this:
_ZL9FibonacciPv:
# Fibonacci function implementation
...
_ZL9UserInputPv:
# UserInput function implementation
...
main:
# If I understand correctly, here we prepare the arguments
# (mainly the pointer to Fibonnacci function) to create pthread_t
sub rsp, 40 #,
lea rdx, _ZL9FibonacciPv[rip] #,
xor ecx, ecx #
lea rdi, 8[rsp] # tmp90,
xor esi, esi #
mov rax, QWORD PTR fs:40 # tmp95,
mov QWORD PTR 24[rsp], rax # D.4751, tmp95
xor eax, eax # tmp95
# Here we create a pthread_t to execute the Fibonacci function
call pthread_create@PLT #
# Here we prepare the arguments for another pthread_t
lea rdx, _ZL9UserInputPv[rip] #,
lea rdi, 16[rsp] # tmp91,
xor ecx, ecx #
xor esi, esi #
# And create a second pthread_t to execute the UserInput function
call pthread_create@PLT #
# Here we do something to join the threads
mov rdi, QWORD PTR 8[rsp] #, prime_thread
xor esi, esi #
call pthread_join@PLT #
mov rdi, QWORD PTR 16[rsp] #, input_thread
xor esi, esi #
call pthread_join@PLT #
# Main function terminates
xor eax, eax #
add rsp, 40 #,
ret
What confuses me is the following:
These assembly instructions will be executed one by one. I assume the call pthread_create@PLT
and call pthread_join@PLT
will at the end return to the call site. Thus, in the end, the program counter will be set to execute these final 3 instructions:
xor eax, eax
add rsp, 40,
ret
which indicate the exit of the main function. I see no parallelism in this code execution, so how do the 2 threads get executed simultaneously here? Does it mean that after the final ret
instruction a program counter is set to some memory address invisible in this assembly code and the program does not actually terminate but start executing those threads?