1

Is there a way how to at least postpone termination of managed app (by few dozens of milliseconds) and set some shared flag to give other threads chance to gracefully terminate (the SO thread itself wouldn't obviously execute anything further)? I'm contemplating to use JIT debugger or CLR hosting for this - I'm curios if anybody tried this before.

Why would I want to do something so wrong?:

Without too much detail - imagine this analogy - you are in a casino betting on a roulette and suddenly find out that the roulette is unreliable fake. So you want to immediately leave the casino, BUT likely want to collect your bets from the table first. Unfortunately I cannot leverage separate process for this as there are very tight performance requirements.

Tried and didn't work:

.NET behavior for StackOverflowException (and contradicting info on MSDN) has been discussed several times on SO - to quickly sum up:

HandleProcessCorruptedStateExceptionsAttribute (e.g. on appdomain unhandled exception handler) doesn't work

ExecuteCodeWithGuaranteedCleanup doesn't work

legacyUnhandledExceptionPolicy doesn't work

There may be few other attempts how to handle StackOverflowExceptions - but it seems to be apparent that CLR terminates the whole process as is mentioned in this great answer by Hans Passant.

Considering to try:

  • JIT debugger - leave the thread with exception frozen, set some shared flag (likely in pinned location) and thaw other threads for a short time.
  • CLR hosting and setting unhandled exception policy

Do you have any other idea? Or any experience (successful/unsuccessful) with those two ways?

Community
  • 1
  • 1
Jan
  • 1,905
  • 17
  • 41

2 Answers2

4

The word "fake" isn't quite the correct one for your casino analogy. There was a magnitude 9 earth quake and the casino building along with the roulette table, the remaining chips and the player disappeared in a giant cloud of smoke and dust.

The only shot you have at running code after an SOE is to stay far away from that casino, it has to run in another process. A "guard" process that starts your misbehaving program, it can use the Process.ExitCode to detect the crash. It will be -1073741571 (0xc00000fd). The process state is gone, you'll have to use one of the .NET out-of-process interop methods (like WCF, named pipes, sockets, memory-mapped file) to make the guard process aware of things that need to be done to clean up. This needs to be transactional, you cannot reason about the exact point in time that the crash occurred since it might have died while updating the guard.

Do beware that this is rarely worth the effort. Because an SOE is pretty indistinguishable from an everyday process abort. Like getting killed by Task Manager. Or the machine losing power. Or being subjected to the effects of an earth quake :)

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • The analogy is indeed very simplified. What I'm trying to express is the fact that only that one single thread is busted. I would want to have a chance for other threads to read a flag and exit (while not leaving 'chips on the table'). I would be even very happy to trade off some stack space of all new threads for chance to do that. However all of this is probably just a philosophical discussion about a fact that was already decided (behavior after SO). I'm wondering if despite this there is some chance to run some code in different threads (e.g. by leveraging JIT debugger) – Jan Jul 24 '14 at 11:26
3

A StackOverflowException is an immediate and critical exception from which the runtime cannot recover - that's why you can't catch it, or recover from it, or anything else. In order to run another method (whether that's a cleanup method or anything else), you have to be able to create a stack frame for that method, and the stack is already full (that's what a StackOverflowException means!). You can't run another method because running a method is what causes the exception in the first place!

Fortunately, though, this kind of exception is always caused by program structure. You should be able to diagnose and fix the error in your code: when you get the exception, you will see in your call stack that there's a loop of one or more methods recursing indefinitely. You need to identify what the faulty logic is and fix it, and that'll be a lot easier than trying to fix the unfixable exception.

Dan Puzey
  • 33,626
  • 4
  • 73
  • 96
  • 2
    A stack overflow is not _always_ caused by program structure, it may also be caused by too much input data. – Roy Dictus Jul 24 '14 at 08:48
  • @RoyDictus: possible but rare, and it's still generally possible to restructure the code to work around that (e.g., make sure the input-capturing method is as light as possible, push the data to a single queue, and have a single processing method for the queue). I'd say that if an application is genuinely running into SO exceptions purely because of the volume of input events, it's in need of more fundamental changes to its architecture. – Dan Puzey Jul 24 '14 at 09:34
  • 1
    @DanPuzey No doubt about fixing. However at the time of exposure it already causes significant loss that needs to be mitigated. I do not need to handle it on the same thread though. SO doesn't means that other threads are out of stack space. That's why I contemplate to use JIT debugger for this. – Jan Jul 24 '14 at 11:21
  • There is not "the stack". There are many stacks, at least one for each thread. Even if one of them is full, other stacks are likely still ok You can possibly run another method on a different thread. – Thomas Weller Nov 20 '20 at 11:39