I have a program with 2 threads, one of them redrawing display (with ncurses), and another is running inout processing on a serial port, outputting some info in process.
I have found out that at some points second thread hangs for reasons unknown to me. How to get to the bottom of the issue if:
- I cannot debug what happens in second thread because libthread_db and libpthread do not match on my system and gdb refuses to provide threading debug.
- Thread that hangs performs processing with sequential calls to
select
andread
on non-blocking file descriptor. - After dropping into gdb with Cntrl-C and resuming the program, thread is unstuck; moreover, it then processes all data stuck in recieving buffer of serial port.
Are there any tips or tricks that will help me get to the bottom of the issue and determine reason for hanging?
Update. Running with strace netted me these lines in trace:
waitpid(-1, 0xbfdcdfd0, 0) = ? ERESTARTSYS (To be restarted)
--- SIGCHLD (Child exited) @ 0 (0) ---
--- SIGCONT (Continued) @ 0 (0) ---
As far as i can tell, that corresponds to times where i saw a hang in the program, suspended it with C-z
and looked at trace file (where nothing new was written until whole program has finished). After every restart thread was unhanged.
So, that means there is a 'rogue' waitpid
call. I know for sure that it is not present in bare form anywhere in my code. A pity gdb fails to put a breakpoint on it - must be an issue of stripped symbols somewhere.