4

Please note: I'm tagging this with JClouds because if you read the entire question and comments that ensue, I believe this to be either a bug with JClouds or a misuse of that library.

I have an executable JAR that runs, works for a while, finishes the work without throwing any errors/exceptions, and then hangs forever when it should be exiting. I profiled it with VisualVM (paying attention to the running threads), and I also tossed in a log statement to print at the point (at the end of the main() method) where the app hangs. Here is the last part of my main method:

Set<Thread> threadSet = Thread.getAllStackTraces().keySet();
for(Thread t : threadSet) {
    String daemon = (t.isDaemon()? "Yes" : "No");
    System.out.println("The ${t.getName()} thread is currently running; is it a daemon? ${daemon}.");
}

When my JAR executes this code, I see the following output:

The com.google.inject.internal.util.Finalizer thread is currently running; is it a daemon? Yes.
The Signal Dispatcher thread is currently running; is it a daemon? Yes.
The RMI Scheduler(0) thread is currently running; is it a daemon? Yes.
The Attach Listener thread is currently running; is it a daemon? Yes.
The user thread 3 thread is currently running; is it a daemon? No.
The Finalizer thread is currently running; is it a daemon? Yes.
The RMI TCP Accept-0 thread is currently running; is it a daemon? Yes.
The main thread is currently running; is it a daemon? No.
The RMI TCP Connection(1)-10.10.99.8 thread is currently running; is it a daemon? Yes.
The Reference Handler thread is currently running; is it a daemon? Yes.
The JMX server connection timeout 24 thread is currently running; is it a daemon? Yes.

I don't think I have to worry about daemons (correct me if I'm wrong), so filtering that to non-daemons:

The user thread 3 thread is currently running; is it a daemon? No.
The main thread is currently running; is it a daemon? No.

Obviously, the main thread is still running because something is preventing it from exiting. Hmmm, user thread 3 looks interesting. What does VisualVM tell us?

enter image description here

This is the thread view at the point that the app was hanging (what was happening while the console output above was printing). Hmmm user thread 3 is looking even more suspicious!

So before killing the app I took a thread dump. Here is the stacktrace for user thread 3:

"user thread 3" prio=6 tid=0x000000000dfd4000 nid=0x2360 waiting on condition [0x00000000114ff000]
    java.lang.Thread.State: WAITING (parking)
        at sun.misc.Unsafe.park(Native Method)
        - parking to wait for  <0x0000000782cba410> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
        at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
        at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
        at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
        at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:744)

    Locked ownable synchronizers:
        - None

I've never had to analyze one of these before so it means gibberish to me (but perhaps not to a trained eye!).

After killing the app, VisualVM's timeline stops ticking/incrementing every second, and I can scroll horizontally backwards in the timeline to where user thread 3 was created and began it's life as a nagging thread:

enter image description here

However I cannot figure out how to tell where in the code user thread 3 is being created. So I ask:

  • How can I tell what is creating user thread 3, and where (especially since I suspect its a 3rd party OSS lib that is creating the thread) it is being created?
  • How can I triage, diagnose and fix this thread hanging?

Update:

Here is my code that is firing around the same time that user thread 3 seems to be getting created:

ExecutorService myExecutor = Executors.newCachedThreadPool();
for(Node node : nodes) {
    BootstrapAndKickTask bootAndKickTask = new BootstrapAndKickTask(node, ctx);
    myExecutor.execute(bootAndKickTask);
}

myExecutor.shutdown();
if(!myExecutor.awaitTermination(15, TimeUnit.MINUTES)) {
    TimeoutException toExc = new TimeoutException("Hung after the 15 minute timeout was reached.");
    log.error(toExc);

    throw toExc;
}

Also here is my GitHub Gist which contains the full thread dump.

DirtyMikeAndTheBoys
  • 1,077
  • 3
  • 15
  • 29
  • 1
    This looks like an Executor's ThreadPool that is not being shut down properly. You may have to call the `shutdown`, `terminate`, `close` method on some component. – Thilo Aug 29 '14 at 02:43
  • 1
    Even daemon threads participate in deadlocks. Just supply all stack traces here. – Basilevs Aug 29 '14 at 02:44
  • @Thilo, will it hang if not shut down? – Basilevs Aug 29 '14 at 02:47
  • 1
    @Basilevs It could. If the ExecutorService was still reachable, say from a global variable, finalize() will never be called and it will never shut down. By default Threads created for an ExecutorService, are created as non-daemon. – Smith_61 Aug 29 '14 at 02:50
  • Java will normally not exit until all non-daemon threads have exited. (Calling `System.exit()` can force everything to exit, but thats usually a bad design.) – Michael Anderson Aug 29 '14 at 03:00
  • It's maybe another pool. By default, threads are not named "user thread". Someone explicitly set that name. – Thilo Aug 29 '14 at 03:34
  • Yeah, I'm *pretty* sure I'm using the Java 5 Concurrency API correctly here with my thread pool, and the `BootstrapAndKickTask` uses a lot of 3rd party libs that do a lot of heavy lifting. – DirtyMikeAndTheBoys Aug 29 '14 at 03:36
  • I doun't want to do a `System.exit()`, but **is there a way to explicitly kill a particular thread from inside the main thread?** I don't want to go that route but might have to if fixing OSS software isn't a viable option for me (I'm under a time crunch). – DirtyMikeAndTheBoys Aug 29 '14 at 03:37
  • 1
    @Zac http://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html#stop%28%29. Not advised to use that though. To what you have posted, you are shutting it down. Which is odd. My guess is it is another pool being created. – Smith_61 Aug 29 '14 at 03:40
  • Thanks @Smith_61 (+1) - are you saying that its odd that I'm shutting down the my pool? Or are you saying it's odd that I'm shutting it down and its not shutting down properly? If the former, what should I be doing instead of shutting down? – DirtyMikeAndTheBoys Aug 29 '14 at 03:44
  • @Zac no it's odd because that should fix it. But looking back at the Thread list, "User thread 3" is an odd name for a pool thread. If you look below it you see the actual pool threads "pool-6-thread-4". What libraries are you using, and what version of Java? – Smith_61 Aug 29 '14 at 03:45
  • The `BootstrapAndKickTask` is using `jclouds-chef` and `jclouds-ssh` to SSH into virtual machines, install the Chef client on them and then run that Chef client. The thing is, I have 4 tasks that get executed each time, and I see log output that indicates all 4 tasks are in fact exiting. Do you think its possible that although *my* threads are exiting that these 3rd party libs are creating their own thread pools that are somehow hanging? – DirtyMikeAndTheBoys Aug 29 '14 at 03:50
  • @Zac and which version of Java? – Smith_61 Aug 29 '14 at 03:53
  • Using Java 1.7.0_51 against JClouds (Chef and SSH) 1.7.3. I haven't been able to find where the source code for JClouds SSH is located but I have decompiled the JAR and am looking for instances of thread pools. – DirtyMikeAndTheBoys Aug 29 '14 at 03:56
  • @Zac is this the repo you are looking for? https://github.com/jclouds/jclouds – Smith_61 Aug 29 '14 at 04:04
  • 1
    @Zac if it is. Then I think this is where the pool gets created. Search the file for "user thread %d". https://github.com/jclouds/jclouds/blob/e711275fb132c8c2f0155400de01275653ad70e7/core/src/main/java/org/jclouds/concurrent/config/ExecutorServiceModule.java – Smith_61 Aug 29 '14 at 04:05
  • Thanks @Smith_61 (+1) - but not 100% sure - the jclouds-chef repo is [here](https://github.com/jclouds/jclouds-chef) and I'm checking now... what I can't find is the repo for jclouds-ssh. – DirtyMikeAndTheBoys Aug 29 '14 at 04:07
  • Thanks @Smith_61 it certainly seems like this is where the thread is getting created. Still no closer to understanding why the thread won't shut down though. Would it hurt for me to try and `stop()` any threads that have "*user thread*" as their name? – DirtyMikeAndTheBoys Aug 29 '14 at 04:12
  • 1
    @zac I'm wondering if ssh lib requires you to close connection explicitly when all tasks are done? – coffeeaddict Aug 29 '14 at 04:17
  • @Zac you can dump all Thread creation after main by using something like this. http://pastebin.com/LEB3RvHT. It hooks the SecurityManager, sadly I couldn't find anyway to print out names of the threads but stack traces are always nice. – Smith_61 Aug 29 '14 at 04:39
  • Your code has no finally section. Fix that. – Basilevs Aug 29 '14 at 05:03
  • Thanks @coffeeaddict (+1) - yes I'm explicitly disconnecting in a `finally` block. Basilevs - yes I'm using a `finally` block. So there. – DirtyMikeAndTheBoys Aug 29 '14 at 10:26
  • 1
    @Zac, but you don't. Your pool shutdown is prevented by exception in previous loop. – Basilevs Aug 29 '14 at 11:54
  • Thanks @Basilevs (+1) - my code snippet is inside a try-catch where the catch is a `catch(Throwable t)` and no exception/error is being caught. Ideas? – DirtyMikeAndTheBoys Aug 29 '14 at 13:30

3 Answers3

5

What appears to be happening, but I can't confirm without code, is that you are forgetting to call shutdown()/shutdownNow() on an ExecutorService. You are leaving, what appears to be, a ThreadPoolExecutor object globally reachable and still running when your main thread exits. Since it's still globally reachable the ExecutorService will never have it's finalize method called and will never shut its self down. By default, Threads created for an ExecutorService, are created as non-daemon, and will happily continue running long after it's needed.

You should either provide code for us to look at, or look through your code to where you use a ThreadPoolExecutor, and properly shut it down after you are done using it.

According to the docs:

A pool that is no longer referenced in a program AND has no remaining threads will be shutdown automatically. If you would like to ensure that unreferenced pools are reclaimed even if users forget to call shutdown(), then you must arrange that unused threads eventually die, by setting appropriate keep-alive times, using a lower bound of zero core threads and/or setting allowCoreThreadTimeOut(boolean).

this means that even if your program no longer has a reference to a ThreadPoolExecutor, it will never be reclaimed as long as at least one Thread remains alive in the pool. You can check the docs for ways around this.

Smith_61
  • 2,078
  • 12
  • 12
  • The docs say "A pool that is no longer referenced in a program AND has no remaining threads will be shutdown automatically. If you would like to ensure that unreferenced pools are reclaimed even if users forget to call shutdown(), then you must arrange that unused threads eventually die, by setting appropriate keep-alive times, using a lower bound of zero core threads and/or setting allowCoreThreadTimeOut(boolean)." Which means even if its unreachable and gets finalized the threads still wont get killed by default. – Michael Anderson Aug 29 '14 at 02:59
  • 1
    Smith is correct. Unused threads in the core pool are never destroyed unless shutdown is called. The thread is locked on the scheduler queue and isn't consuming any cpu. Its not "hung" at all. It is waiting for the next task. – bond Aug 29 '14 at 03:00
  • @MichaelAnderson That is correct. I should be more clear in my answer, and it will never be finalized if there exists one running Thread in the pool. Because that Thread contains a reference to the pool. – Smith_61 Aug 29 '14 at 03:03
3

It would be good if you could paste the entire code you use. Apache jclouds uses a couple executors to perform certain tasks, and you have to close them.

Make sure you call the close() method on the context or api you get from the jclouds ContextBuilder.

Ignasi Barrera
  • 1,476
  • 9
  • 8
  • Thanks @Ignasi Barrera (+1) - I'm testing now and will get back to you. I **was not** closing *either* the API or ChefContext, hopefully this does it! – DirtyMikeAndTheBoys Aug 29 '14 at 14:21
  • You just need to close what you directly get from the ContextBuilder. In this case closing the ChefContext should be enough. – Ignasi Barrera Aug 29 '14 at 14:32
1

There are two errors:

  1. You fail to exception-safe free allocated resource (thread pool)
  2. You catch errors you don't know how to handle.

Here is potential fix. (I'm not sure if we should include wait for tread completion in finally block)

ExecutorService myExecutor = Executors.newCachedThreadPool();
try {
    for(Node node : nodes) {
        BootstrapAndKickTask bootAndKickTask = new BootstrapAndKickTask(node, ctx);
        myExecutor.execute(bootAndKickTask);
    }
} finally {    
    myExecutor.shutdown();
    if(!myExecutor.awaitTermination(15, TimeUnit.MINUTES)) {
        TimeoutException toExc = new TimeoutException("Hung after the 15 minute timeout was reached.");
        log.error(toExc);
        throw toExc;
    }
}
Basilevs
  • 22,440
  • 15
  • 57
  • 102