1

I know the accepted, correct solutions for gracefully closing a thread.

But assume my game is supposed to be fault-tolerant, and when a new gamegplay is started, it tries to gracefully close the (now-paused) thread of the old gameplay. If it fails to join it / close it (e.g. because the old thread is buggy and is in an infinite loop), it instantiates the new Thread and starts it. But I don't like the fact that the old thread is still there, eating resources.

Is there an accepted way to kill an unresponsive thread without killing the process? It seems to me there isn't, in fact, I read somewhere that a Thread might not react to Thread.stop() either.

So there is no way dealing with a thread in an infinite loop (e.g. due to a bug), is it? Even if it reacts to Thread.stop(), the docs say that Thread.stop() may leave Dalvik VM in an inconsistent state...

Thomas Calc
  • 2,994
  • 3
  • 30
  • 56

1 Answers1

1

If you need this capability, you must design it and implement it. Obviously, if you don't design and implement a graceful way to shut down a thread, then there will be no way to gracefully shut down a thread. There is no generic solution because the solution is application-specific. For example, it depends on what resources the thread might hold and what shared state the thread may hold locks on or have corrupted.

The canonical answer is this: If you need this capability, don't use threads. Use processes.

The core reason is the way threads work. You acquire a lock and then you manipulate shared data. While you're manipulating that shared data, it can enter an inconsistent state. It is the absolute responsibility of a thread to restore the data to a consistent state before releasing the lock. (Consider, for example, deleting an object from a doubly-linked list. You must adjust the forward link or the reverse link first. In between those two operations, the linked-list is in an inconsistent state.)

Say you have this code:

  1. Acquire a lock or enter a synchronized block.

  2. Begin modifying the shared state the lock protects.

  3. Bug

  4. Return the data the lock protects to a consistent state.

  5. Release the lock.

So, now, what do we do? At step 3, the thread holds a lock and it has encountered a bug and triggered an exception. If we don't release the lock it acquired in step 1, every thread that tries to acquire that same lock will wait forever, and we're doomed. If we do release the lock it acquired in step 1, every thread that acquires the lock will then see the inconsistent shared state the thread failed to clean up because it never got to step 4. Either way, we're doomed.

If a thread encounters an exceptional condition the application programmer did not create a sane way to handle, the process is doomed.

David Schwartz
  • 179,497
  • 17
  • 214
  • 278
  • 1
    But the problem is exactly fault tolerance:i.e. that the Thread went to an infinite loop for whatever reason (bug), and we need a new Thread (the old one can be killed). Developers don't design bugs, bugs just come.My thread has a graceful mechanism, but obviously, it can't do anything if a programmer error leads to an infinite loop.My question is not about graceful termination, but about a reliable termination that works and won't let the VM in inconsistent state.My own variables that the Thread used are unnecessary by that time, so those are not reused, but the Android docs mention VM too... – Thomas Calc May 18 '12 at 00:38
  • When I write "we need a new Thread", I mean that a new gameplay is started, so the old Thread is not needed, but it should be wiped out to clean resources (and it can't be done gracefully because it's in an infinite loop due to a programming error). – Thomas Calc May 18 '12 at 00:40
  • 1
    @ThomasCalc: "How can I write code to correctly handle a case I didn't write code to correctly handle?" Obviously, you can't. The thread has access to all process resources, so no process resources can be assumed to be uncontaminated. There are no walls between threads (that's why we use them) so if a thread is corrupt, all the threads in the process are. You need a new process. – David Schwartz May 18 '12 at 00:47
  • Thanks, this is the confirmation I needed. About how you rephrased my question in quotes, it is not that trivial: fault tolerance is exactly for these cases (think of an aeroplane subsystems), exactly for cases when a failure occurs for an unknown (or known but uninfluencable -- think of hardware transient errors) cases. Of course, I understand that threads are not the way to do that. Unexpected thread exceptions, on the other hand, are fine: the thread terminates, and we just start a new thread. – Thomas Calc May 18 '12 at 00:56
  • You can't terminate a thread on an unexpected exception. Threads acquire locks and then manipulate process state information. They must not release any locks they hold until they've put the process state information into a consistent form. If a thread has an exception it wasn't coded to handle and holds such a lock, what do you do? If you don't release the lock, every thread that tries to acquire that lock will deadlock. If you do release the lock, any thread that acquires that lock will see shared resources in the inconsistent state the faulting thread left them in. – David Schwartz May 18 '12 at 01:18
  • If a thread has a fault it is not specifically coded to handle, the process is contaminated. It must be terminated and a new process used. There is no way to restore the process to a sane state. – David Schwartz May 18 '12 at 01:18
  • Sorry, I think you misunderstood my comment. I don't want to terminate the thread on an unexpected exception, it was just an example of a correct fault tolerance: when a standard Java Exception is thrown by a thread (that is not the main thread of the app), we catch it, and gracefully close it. Do you mean that if my thread has an uncaught exception (standard JVM Exception), then the JVM doesn't correctly release the locks of the thread and then terminate the thread? – Thomas Calc May 18 '12 at 01:27
  • 1
    In native code, you have full access to the POSIX calls for killing threads and processes (subject to process privileges, of course). So you can violate the restrictions imposed at the Java/Dalvik layer on killing threads. But of course on your own head be it! – Lawrence D'Oliveiro May 18 '12 at 02:04
  • 1
    @ThomasCalc: Yes, that's *exactly* what I mean. The JVM *can't* correctly release the locks because it has no way to know how to clean up the data structures the lock protects. Once a thread has encountered an error condition that the application code does not know how to gracefully clean up, the process context (and all the other threads in it) should be considered corrupt. (I'll update my answer to explain this in more detail.) – David Schwartz May 18 '12 at 02:11