I have a fairly straightforward C program that runs much faster on one thread than on multiple threads. (I'm running on a four-core i5 processor.)
By using the highly scientific "GDB halt debugging" technique, I've determined that it looks like only one thread is actually executing at a time.
Basically, when I hit ^C
in GDB and type info threads
, I get something like this:
Id Target Id Frame
29 Thread 0x7ffff5cec700 (LWP 14787) "corr" __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
28 Thread 0x7ffff64ed700 (LWP 14786) "corr" __lll_unlock_wake_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:341
27 Thread 0x7ffff6cee700 (LWP 14785) "corr" 0x00007ffff752ca2c in __random () at random.c:296
26 Thread 0x7ffff74ef700 (LWP 14784) "corr" __lll_lock_wait_private () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:95
* 1 Thread 0x7ffff7fd5740 (LWP 14755) "corr" 0x00007ffff78bf66b in pthread_join (threadid=140737342535424, thread_return=0x7fffffffdd80) at pthread_join.c:92
(Thread 1 is the main thread; threads 26–29 are worker threads.)
A quick Google search seems to imply that these functions have something to do with deadlock detection, but I can't get much beyond that. What are these functions, and why are they slowing down?
Possibly relevant: If I join with each thread immediately after creating it, and before creating the others (i.e., not really multithreading at all, but still incurring the thread overhead), this effect does not occur, and my program runs more quickly.
In case it's useful, here's a code dump (159 lines).