4

I want to program a daemon-manager that takes care that all daemons are running, like so (simplified pseudocode):

void watchMe(filename)
{
    while (true)
    {
        system(filename); //freezes as long as filename runs
        //oh, filename must be crashed. Nevermind, will be restarted            
    }
}       

int main()
{
    _beginThread(watchMe, "foo.exe");
    _beginThread(watchMe, "bar.exe");
}

This part is already working - but now I am facing the problem that when an observed application - say foo.exe - crashes, the corresponding system-call freezes until I confirm this beautiful message box:

error msg

This makes the daemon useless.

What I think might be a solution is to make the main() of the observed programs (which I control) "uncrashable" so they are shutting down gracefully without showing this ugly message box.

Like so:

try
{
    char *p = NULL;
    *p = 123; //nice null pointer exception
}
catch (...)
{
    cout << "Caught Exception. Terminating gracefully" << endl;
    return 0;
}

But this doesn't work as it still produces this error message:

error msg

("Untreated exception ... Write access violation ...")

I've tried SetUnhandledExceptionFilter and all other stuff, but without effect.

Any help would be highly appreciated.

Greets

David Müller
  • 5,291
  • 2
  • 29
  • 33
  • 2
    A null-pointer dereference doesn't explicitly result in an exception. – Xeo Jun 05 '11 at 11:01
  • 1
    This is exactly what happens with my VS2005 Setup. But anyway, this was just an example for code that makes the application crash. – David Müller Jun 05 '11 at 11:04
  • 1
    @David: If you want an example of an exception, just throw one. :) `throw 666;` – Xeo Jun 05 '11 at 11:05
  • 1
    If I do it this way, I am able to catch the Exception. My problem is, that even though I wrapped the code in try/catch, I can't do anything against the application crash. – David Müller Jun 05 '11 at 11:07
  • Why don't you rather fix the bug in this other program (since you control it)? – RedX Jun 05 '11 at 11:40
  • 1
    The program depends on some libraries which I can't guarantee for. Furthermore, the program has to run "forever" and you will never know - so I better be careful. – David Müller Jun 05 '11 at 11:46

4 Answers4

5

This seems more like a SEH exception than a C++ exception, and needs to be handled differently, try the following code:

__try
{
    char *p = NULL;
    *p = 123; //nice null pointer exception
}
__except(GetExceptionCode() == EXCEPTION_ACCESS_VIOLATION ? 
             EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH)
{
    cout << "Caught Exception. Terminating gracefully" << endl;
    return 0;
}

But thats a remedy and not a cure, you might have better luck running the processes within a sandbox.

arul
  • 13,998
  • 1
  • 57
  • 77
0

You can change the /EHsc to /EHa flag in your compiler command line (Properties/ C/C++ / Code Generation/ Enable C++ exceptions).

See this for a similar question on SO.

Community
  • 1
  • 1
Andrei
  • 4,880
  • 23
  • 30
0

You can run the watched process a-synchronously, and use kernel objects to communicate with it. For instance, you can:

  1. Create a named event.
  2. Start the target process.
  3. Wait on the created event
  4. In the target process, when the crash is encountered, open the named event, and set it.

This way, your monitor will continue to run as soon as the crash is encountered in the watched process, even if the watched process has not ended yet.

BTW, you might be able to control the appearance of the first error message using drwtsn32 (or whatever is used in Win7), and I'm not sure, but the second error message might only appear in debug builds. Building in release mode might make it easier for you, though the most important thing, IMHO, is solving the cause of the crashes in the first place - which will be easier in debug builds.

Eran
  • 21,632
  • 6
  • 56
  • 89
  • Thanks for your answer. But the problem remains. Even if I use CreateProcess, my silly Null Pointer dereferencing produces the same stupid message box ("Application has terminated ..."). – David Müller Jun 05 '11 at 11:47
  • @David Müller - My suggestion doesn't prevent the message from appearing, but at least you know the daemon has crashed, and can run another instance. To disable the message itself, look into the suggestions here: http://stackoverflow.com/questions/735170/can-the-application-error-dialog-box-be-disabled – Eran Jun 05 '11 at 12:04
0

I did this a long time ago (in the 90s, on NT4). I don't expect the principles to have changed.

The basic approach is once you have started the process to inject a DLL that duplicates the functionality of UnhandledExceptionFilter() from KERNEL32.DLL. Rummaging around my old code, I see that I patched GetProcAddress, LoadLibraryA, LoadLibraryW, LoadLibraryExA, LoadLibraryExW and UnhandledExceptionFilter.

The hooking of the LoadLibrary* functions dealt with making sure the patching was present for all modules. The revised GetProcAddress had provide addresses of the patched versions of the functions rather than the KERNEL32.DLL versions.

And, of course, the UnhandledExceptionFilter() replacement does what you want. For example, start a just in time debugger to take a process dump (core dumps are implemented in user mode on NT and successors) and then kill the process.

My implementation had the patched functions implemented with __declspec(naked), and dealt with all the registered by hand because the compiler can destroy the contents of some registers that callers from assembly might not expect to be destroyed.

Of course there was a bunch more detail, but that is the essential outline.

janm
  • 17,976
  • 1
  • 43
  • 61