2

I use a lot of components in my Delphi 7 Service application, Indy, Synapse, Zeolibs, etc.

My application is generally stable, I use Eurekalog 6 to capture exceptions, but in rare situations, some threads hang because a 3rd party function it calls has hung, e.g. Indy gets stuck when trying to send email.

In many cases, the application that hung are my customer place, I've no access to their computer, so it is not possible for me to do a live debug. My application requires high availability so even if it hangs once a year, that is not acceptable to my users.

I am now looking for the best way to deal with such a situation where debugging is not feasible but I will still need the application recover by itself. Is it possible for a thread to terminate if a function it calls hangs? Alternatively, I can also restart the entire service when that happens. How about a Watchdog and what is the best way to implement it? Thanks.

Joshua
  • 1,709
  • 2
  • 24
  • 38
  • 1
    Take a look at madshi's [madExcept](http://madshi.net/madExceptDescription.htm). It will allow your app or your users to send an entire stack trace and detects a hung main thread. Any other hung thread, you should be able to code for. – Marcus Adams Aug 15 '12 at 16:49
  • For handling exceptions in a thread, see [Delphi thread exception mechanism](http://stackoverflow.com/q/3627743/576719) and [How to handle exceptions in TThread objects](http://edn.embarcadero.com/article/10452). – LU RD Aug 15 '12 at 16:49
  • @LURD what's special about handling exceptions in a thread? Surely exception handling is the same in any thread, main or worker? – David Heffernan Aug 15 '12 at 18:22
  • 1
    @DavidHeffernan, an unhandled exception in a thread execute can be hard to spot. You must check thread property `FatalException` to get the exception before the thread is freed. – LU RD Aug 15 '12 at 18:45
  • 1
    @LURD Well, I always wrap my thread procs in a try/except which I guess comes to the same thing. – David Heffernan Aug 15 '12 at 18:47
  • @DavidHeffernan, yes it is. But we can't tell if the OP has properly guarded the thread. – LU RD Aug 15 '12 at 18:57
  • Great question - the OP asks for a specific recipe, but instead got great advice on how to deal with his situation. Gotta love SO. – Leonardo Herrera Aug 16 '12 at 15:53
  • If your app really needs high Availability then you need to adopt Test Driven Development (TDD). – Warren P Aug 17 '12 at 20:07

2 Answers2

10

I think you are being rather defeatist. Find and fix the bugs. It might be tricky, but it's the right solution.

Killing threads whose behviour you don't understand is never the solution. If you start killing threads you'll likely make things worse. That can lead to other runtime errors, deadlock and so on. Once you start killing threads you've lost control.

Now, it would be safe to kill the process (rather than a specific thread) and rely on a watchdog service to restart the process. But that's a really dire solution.

You should certainly use a tool like madExcept, EurekaLog etc. to debug unexpected exceptions. I see you are already using EurekaLog - that's good.

Deadlocks (it sounds like you have deadlock) can be more tricky to chase down. One good way to debug a deadlock is to get your client to produce a crash dump (e.g. from Process Explorer). Then debug it in WinDbg using map2dbg to produce symbolic stack traces. That will tell you which threads are blocking and that reveals the deadlock. And then fix the bugs.

For more details on this deadlock debugging technique see here: http://capnbry.net/blog/?p=18

I'm not familiar with EurekaLog since I use madExcept, but I would expect EurekaLog has a facility to allow generation of thread stack traces for a hung process. If so then that would most likely be the best approach for you.

David Heffernan
  • 601,492
  • 42
  • 1,072
  • 1,490
  • killing threads is surely workaround rather than solution. However that workaround is not that senseless. If 3rd party libs are causing freezes, then to find bugs in them having no chance to debug is rather tricky. If his data structures are perfectly thread-local and isolated then killing thread is not to be very dangerous. Contrary, according to "Making reliable distributed systems in the presence of software errors" kill-and-try-again sometimes is a solution. Sometimes not. then he would have cascading failures up to application restart. – Arioch 'The Aug 16 '12 at 09:01
  • @arioch If the thread to be killed is completely isolated then it's safe apart from the heap memory leak. But most likely bugs are in OP's code anyway. And if not debugging libs is not hard. Those libs comes with source. – David Heffernan Aug 16 '12 at 09:42
  • @David Heffernan - You're right, I'm talking about deadlocks. They don't happen often but my application is sometimes used in a mission critical scenario. Eurekalog already can capture all unhandled exceptions, but there are cases when there is no exception but the app locks up. I'm now exploring how best to implement a watchdog so if any thread hangs, the watchdog will restart the process. For your info, I'm already using a watchdog timer inside the service to reset an external watchdog service I created but this doesn't work for hang threads-it only works if the timer(whole service) hangs. – Joshua Aug 16 '12 at 10:48
2

Your question is rather too vague. If you don't know which of the various components you're using you wish to blame, then you have zero hope of fixing it. The most likely thing is you're doing something wrong, or that you don't understand how these components work. I very much doubt that it's purely a bug in the components themselves, but hey, either way it's all on you to find what's having a problem, and your job to fix it.

A deadlock that you've created, or a deep process corruption issue, that is happening, may prevent MadExcept from giving you any information, but it's worth trying.

To find out which one is freezing, if any at all, then the madexcept comment is the best suggestion yet. It will time-out (after a configurable # of seconds) and raise an artificial exception for you, interrupting your hung process. This works for user code, and for places where the thread is blocked in a Win32 or kernel function. For example, it's possible that you've set up Indy for infinite timeouts, as that's the default these days in Indy 10, and that what you're experiencing is a timeout related freeze, where network activity that you expected to complete but which never will complete, is causing your program to "hang". The cure here is to change your timeouts.

However, until you figure out WHERE the problem is, I doubt you'll be able to fix it. And so, for that, again, Marcus is right, you should be looking into madExcept. I can't live without it.

Secondly, you should really be adding trace logic to your program, so you know where it's going and what it was doing just before it had a problem. If you really need help doing that, you could try CodeSite, from Raize. Personally I find that OutputDebugString combined with the free Microsoft DebugView utility (formerly from SysInternals) tool is more than enough to debug such problems on a client computer.

Any program with background threads that does not have trace logging, is a badly designed program. Heck, any non-trivial single threaded application that might ever fail or have problems, needs trace logging.

Logging is always going to help, even when MadExcept or other exception tools don't. Trace-Logging is usually a roll-your-own solution, although CodeSite is also quite popular.

Warren P
  • 65,725
  • 40
  • 181
  • 316
  • I heartily agree that CodeSite and SmartInspect are supberb debugging tools. If the OP is having a lot of trouble debugging, I recommend using one of these tools. I use SmartInspect, but both are equally good IHMO. – Sean B. Durkin Aug 16 '12 at 08:29