I'm in a notebook with Apple M1, which has 4 cores. I have a problem which can be solved in parallel by spawning millions threads dynamically. Since pthread_create
has a huge overhead, I'm avoiding it by, instead, leaving 3 threads in the background. These threads wait for tasks to arrive:
void *worker(void *arg) {
u64 tid = (u64)arg;
// Loops until the main thread signals work is complete
while (!atomic_load(&done)) {
// If there is a new task for this worker...
if (atomic_load(&workers[tid].stat) == WORKER_NEW_TASK) {
// Execute it
execute_task(&workers[tid]);
}
}
return 0;
}
These threads are spawned with pthread_create
once:
pthread_create(&workers[tid].thread, NULL, &normal_thread, (void*)tid)
Any time I need a new task to be done, instead of calling pthread_create
again, I just select an idle worker and send the task to it:
workers[tid].stat = WORKER_NEW_TASK
workers[tid].task = ...
The problem is: for some reason, leaving these 3 threads on the background makes my main thread 25% slower. Since my CPU has 4 cores, I expected these 3 threads to not affect the main thread at all.
Why are the background threads slowing down the main thread? Am I doing anything wrong? Is the while (!atomic_load(&done))
loop consuming a lot of CPU power?