1

On Solaris 10, I have a multi-threaded application, where I have concurrently running Posix threads. All child threads are created using pthread_create and as joinable using pthread_attr_setdetachstate(&attr,PTHREAD_CREATE_JOINABLE);. When main thread terminates, system appears to hang, as control in children threads want to resume but the threads may not be there. While running gdb, after main thread comes out it shows:

procfs: fetch_registers, get_gregs line 3497, /proc/16304/lwp/2: No such file or directory.

If I issue list command of gdb, it shows me some lines from the child threads, and info threads command is showing some thread is still there even when main thread seems to be exited. As reported in https://groups.google.com/forum/#!topic/comp.programming.threads/RKHLvBINYkE, could this be a bug within gdb or all is well? Or could it be because of joinable thread to do with this, may be it could be created as detached thread using pthread_attr_setdetachstate(&attr,PTHREAD_CREATE_DETACHED);? When I try pstack -F PID outside gdb to see who is doing what, I see the following:

-----------------  lwp# 1 / thread# 1  --------------------
001dfa70 main     (3c4170d0, 1, ffbff5f0, ffbff55c, 3c4170d0, 600c74) + ac8
0017f698 _start   (0, 0, 0, 0, 0, 0) + 5c
-----------------  lwp# 3 / thread# 3  --------------------
fcb4af80 lwp_park (0, 0, 0)
ff325208 sem_wait (3bcb8590, 3bcd4abc, 1, 0, fc2c0a40, 1) + 20
001de984 _Z19processForSRFReaderPv (0, fc0e0000, 0, 0, 1de968, 3bcd4a9c) + 1c
fcb4aee0 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 4 / thread# 4  --------------------
fcb4af80 lwp_park (0, 0, 0)
ff325208 sem_wait (600a48, 0, fcbc7940, 0, fc2c1240, 19607c) + 20
00181d0c _Z14processForWorkPv (6009cc, fc0b0000, 0, 0, 181cc0, 3bcd4a9c) + 4c
fcb4aee0 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 5 / thread# 5  --------------------
fcb4af80 lwp_park (0, 0, 0)
ff325208 sem_wait (600a78, 0, fcbc7940, 0, fc2c1a40, 0) + 20
00181d0c _Z14processForWorkPv (6009d8, fc080000, 0, 0, 181cc0, 3bcd4a9c) + 4c
fcb4aee0 _lwp_start (0, 0, 0, 0, 0, 0)
-----------------  lwp# 6 / thread# 6  --------------------
fcb4af80 lwp_park (0, 0, 0)
ff325208 sem_wait (600aa8, 0, fcbc7940, 0, fc2c2240, 19607c) + 20
00181d0c _Z14processForWorkPv (6009e4, fc050000, 0, 0, 181cc0, 3bcd4a9c) + 4c
fcb4aee0 _lwp_start (0, 0, 0, 0, 0, 0)
Dr. Debasish Jana
  • 6,980
  • 4
  • 30
  • 69
  • "system appears to hang", Solaris freeze? if the system freeze it's a kernel bug. With a mcve it's would be more easy to test... – Stargateur Jan 05 '17 at 09:37
  • application appears to hang in the sense that ps shows the process is still running – Dr. Debasish Jana Jan 05 '17 at 09:41
  • Do you quit the program with `pthread_exit()`? http://stackoverflow.com/a/11875977/7076153. – Stargateur Jan 05 '17 at 09:59
  • @Stargateur, nope, main thread is existing normally, while some of the child threads could be running – Dr. Debasish Jana Jan 05 '17 at 10:03
  • Does this problem occur when you're *not* running under gdb? If so, post the output of `pstack PID`, where `PID` is the process ID of the still-running process. – Andrew Henle Jan 05 '17 at 10:37
  • @AndrewHenle yes, running a pstack PID outside gdb shows process is traced – Dr. Debasish Jana Jan 05 '17 at 10:46
  • You can also try `pstack -F PID`. That will force the `pstack` process to work despite the fact that it detects the debugger process controlling the PID. And does your process fail to properly exit when it's not being run under gdb at all? – Andrew Henle Jan 05 '17 at 10:52
  • @AndrewHenle, edited my Qs with pstack o/p, I am simulating a situation within gdb that cannot occur outside gdb to observe – Dr. Debasish Jana Jan 05 '17 at 11:27
  • @Dr.DebasishJana The `pstack` output indicates your `main()` hasn't returned yet: `001dfa70 main (3c4170d0, ...`, – Andrew Henle Jan 05 '17 at 11:31
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/132385/discussion-between-dr-debasish-jana-and-andrew-henle). – Dr. Debasish Jana Jan 05 '17 at 11:37
  • the main thread must wait for the other threads to exit before itself exiting. The other threads should contain a final statement of: `pthread_exit( NULL );` and the main thread should have a call similar to: `pthread_join( threadID, NULL );` for each of the other threads. – user3629249 Jan 06 '17 at 03:01
  • when asking a question about a run time problem: post a short code that cleanly compiles and still shows the problem. – user3629249 Jan 06 '17 at 03:03
  • Where's the code? – babon Jan 09 '17 at 04:49

0 Answers0