4

Summary

I have a server that should be long-running, and that spawns a few background threads for IO. I'm trying to make sure that the background/IO threads don't go down, or that they'll be brought back up if they do go down.

Current Solution

Currently my main loop just checks the status of all background checks (pseudo-code below). I think there should be a better way.

while (!Thread.currentThread().isInterrupted()) {
    maintainThreads();
    doWork();
    condition.await(30, TimeUnit.SECONDS);
}

My Attempt

I'm considering switching to a SingleThreadExecutor, with a custom queue that won't remove the Runnable when it pulls the next task. The executor would then manage the threads for me so I could take it out of my main loop.

I'm worried that having one executor for each thread will be a performance hit, and that there are simpler/better solutions that exist for this problem. I've also considered setting up shutdown hooks for each thread to have them just restart themselves.

Any help would be appreciated.

billoot
  • 77
  • 15
  • What do you mean by "go down"? Do you mean that some contained code throws an exception? You could just wrap the body of your `run()` method in try/catch which would prevent the thread from terminating on an exception - but you have to think pretty long and hard about what it means for your `doWork()` and other methods to throw exceptions. Is it even possible to continue? – BeeOnRope Dec 26 '16 at 22:31
  • [UncaughtExceptionHandler](http://docs.oracle.com/javase/8/docs/api/java/lang/Thread.UncaughtExceptionHandler.html?is-external=true) ? – GPI Dec 26 '16 at 22:38
  • @BeeOnRope The threads I'm maintaining are mostly just IO connections (producers for `doWork()`) that already catch all checked exceptions. If the threads were to terminate then the main loop would just be spinning, all it has to do is log the issue and try recreating them until they connect. I guess my main question is "assuming I want a thread to run as long as my program is running, no matter what caused it to go down, what is the cleanest way to do that". – billoot Dec 26 '16 at 22:40
  • The only way a thread can properly "do down" is by throwing an exception, so if you catch all `Throwable` in your main loop (certainly not just all _checked exceptions_), the thread will stay up. The only way to stop a thread is `Thread.stop()` - but that is badly broken and if anything in your application calls that you cannot safely restart it in any case. – BeeOnRope Dec 26 '16 at 22:42
  • @BeeOnRope So you're saying that as long as I catch all `Throwable` within loop of the `run()` section of a `Runnable`, then provided I code it right I'm guaranteed the thread will never terminate? I guess I'm just worrying about nothing, put it in an answer so I can accept it. – billoot Dec 26 '16 at 22:55
  • I'll put it an answer with more details. – BeeOnRope Dec 26 '16 at 22:57

3 Answers3

2

The real gotcha here is what you mean by go down in "or that they'll be brought back up if they do go down."

There are only two ways that I know of that a thread can go down without the entire process itself exiting in java:

  1. The run() method terminates, either via exception or finishing the run method normally (i.e., non-exceptionally).
  2. Thread.stop() is called on your thread.

Let's tackle (2) first - Thread.stop() is deprecated and is a big no-no in any well-behaving application. You can pretty much assume it is not going to be called, because if it is called, your application is already badly broken. Restarting any thread at this point may have undefined effects since your application is an inconsistent state.

So then for (1), you just have to ensure that run() doesn't terminate. It won't terminate normally because you've already set up an infinite loop. To stop it from terminating exceptionally, you can catch (Throwable t) and just keep looping (after logging the error appropriately).

Of course, a catch (Throwable t) without a subsequent rethrow is usually a code smell. It means you caught some time of unspecified error, and then decided to keep going anyways. The errors might range from the benign (e.g., a SockedClosedExcpetion because a remote client disconnected) to the unrecoverable (e.g., an OutOfMemoryError or something even worse). You should really ask yourself if you want this thread to continue in the face of any type of exception.

Your application could be an invalid state and may not be able to continue. One compromise would be to only catch subclasses of Exception and terminate the application on Error. A more conservative approach would be to terminate the application on any type of exception that you don't know how to handle (and treat it as a bug to be fixed).

Community
  • 1
  • 1
BeeOnRope
  • 60,350
  • 16
  • 207
  • 386
2

An important part of maintaining persistent background threads is handling your exceptions correctly at the Thread level. When handling error conditions and especially exceptions in your top-level server/daemon code you need to keep in mind that some exceptions can't be handled! When such an exception is encountered you should quit immediately or try to clean as much as you can and then quit.

For example most exceptions of type Error shouldn't be handled. This includes java.lang.VirtualMachineError exceptions: InternalError, OutOfMemoryError, StackOverflowError, UnknownError etc. As the previous answer mentions, catching Throwable is a big No-No as many exceptions can't be recovered. Think about your failure strategies - when would failing makes sense, what can you do in this case (may be log an error, or display a message to the user).

Try to always properly handle InterruptedException as it gives you time to clean up and gracefully shut down your threads. Otherwise you are risking data corruption.

For more exception handling tips check my Exceptions Guidelines post.

Stan Ivanov
  • 299
  • 2
  • 7
1

For application program, process (not thread) recreating/restarting is the most reliable fail recovery method.

How the really mission critical systems handle the failure? By providing redundancy, heart-beat monitoring, fast handover, and so on.

Don't try to keep the already failed thread blindly. There are many causes that can wreak havoc our process and we (human) only know just a few of those causes.

If we FAIL FAST and restart the process, OS kernel ensure us clean initial state. So even if our program is not too much reliable the program will run and do the job in some amount of the time.

9dan
  • 4,222
  • 2
  • 29
  • 44