How To Handle a Function That Doesn't Return

Question

I have a function called ApiCalls() that is wrapped in a locker because the api I'm using is not multi-thread safe. Occasionally an api call fails to return and I can't think of a way to handle this kind of situation. I was thinking about creating a timer on the lock object, but it seems the locker doesn't not have something like that.

Or maybe move it out-of-process, if it can't be made stable. — Steven Sudit, Sep 14 '10 at 16:03
This is a duplicate of http://stackoverflow.com/questions/2187086. Your problem is essentially "I have an unreliable subsystem. What do I do?", so see also my comments on dealing with unreliable subsystems at the end of this crazy long answer: http://stackoverflow.com/questions/2113261/using-lock-statement-within-a-loop-in-c/2118236#2118236. That answer inspired this blog post on the same subject: http://blogs.msdn.com/b/ericlippert/archive/2010/02/22/should-i-specify-a-timeout.aspx — Eric Lippert, Sep 14 '10 at 18:42

score 7 · Accepted Answer · answered Sep 14 '10 at 16:02

7

There's really no good answer for this. A bad, but probably workable, answer is to have a watchdog thread that Aborts the calling thread after a timeout. In other words, after acquiring the lock but before calling the API, you'd order the watchdog to kill you. When you get back from the call (if you get back), you'd call off the watchdog.

Again, this is not a great solution, as Abort is very messy.

answered Sep 14 '10 at 16:02

Steven Sudit

19,391
1
51
53

+1 - Probably the only workable solution here. Remember to catch the Abort exception generated. Not too sure how you'd go about any memory leaks though, although in this instance they may not be too much of a problem unless performed frequently and often. – ChrisBD Sep 14 '10 at 16:08
The problem isn't so much leakage as missing cleanup. You can catch the exception, and it's fine to do so, but it won't stop it from propagating. – Steven Sudit Sep 14 '10 at 16:13

score 4 · Answer 2 · answered Sep 14 '10 at 16:04

4

I don't think you can reasonably recover from this problem. Suppose that you could timeout, you would then attempt to call the API again, but the previous call is still active and you have said that the API is not thread-safe.

You simply cannot defend yourself from fundamentally flawed dependencies of this kind.

The only really safe thing to do is to restart the process. Steven Sudit's suggestion is one way to achieve that.

answered Sep 14 '10 at 16:04

djna

54,992
14
74
117

Right, the watchdog concept could be expanded to restarting the entire process, and that's not a bad idea at all. – Steven Sudit Sep 14 '10 at 16:14

score 3 · Answer 3 · edited May 23 '17 at 12:19

3

This can be solved by wrapping the API calls in a separate assembly and loading that assembly into a seperate application domain by using the AppDomain class.....

Use application domains to isolate tasks that might bring down a process. If the state of the AppDomain that's executing a task becomes unstable, the AppDomain can be unloaded without affecting the process. This is important when a process must run for long periods without restarting.

You can then call thread abort on the call in the separate AppDomain, signal the host domain that an abort has happened. The host domain would unload the offending domain, thus unloading the API, and start a new domain with the API reset. You would also want a watchdog on the API domain so the host could take action if the API domain freezes.

Miscellaneous links: C# Nutshell AppDomain Listings, cbrumme's WebLog, Good example of use of AppDomain, Using AppDomain to Load and Unload Dynamic Assemblies

edited May 23 '17 at 12:19

Community

1
1

answered Sep 14 '10 at 17:27

Rusty

3,228
19
23

I agree that a thread doesn't provide enough insulation, which is why I upvoted djna. However, I'm not sure that an AppDomain does, either. It depends very much on *why* that API call doesn't return. If it's corrupting memory, then an AppDomain will not stop it, only a Process will. – Steven Sudit Sep 14 '10 at 18:22
@Steven Yes...if the API is unmanaged it could technically do anything, but an AppDomain does provide a great deal of protection...MSDN: "You can run several application domains in a single process with the same level of isolation that would exist in separate processes, but without incurring the additional overhead of making cross-process calls or switching between processes." – Rusty Sep 14 '10 at 22:05
Yes, I read that, too, but it's based on the assumption that assemblies will behave by not overwriting memory. This is not a safe assumption in this case. – Steven Sudit Sep 14 '10 at 23:24
@Steven Agreed. There is not much you can do with an assembly that might randomly corrupt memory outside of its domain or process. – Rusty Sep 15 '10 at 02:23
Ah, but that's the point: it cannot corrupt memory outside its domain! – Steven Sudit Sep 15 '10 at 02:25
It cannot corrupt memory outside of its domain if it's "safe" code, but it can if it's "unsafe" or uses P/Invoke. – Steven Sudit Sep 18 '10 at 01:36

score 2 · Answer 4 · answered Sep 14 '10 at 17:12

The only safe-ish solution is probably to start another process to handle the API calls, and then kill the process if they get stuck. Even that doesn't guarantee that the API's handlers won't get into a bogus state that can only be cured via system restart, but using Thread.Abort can mortally wound a process.

If you don't want to use "untrusted" means of killing the process, you could have one thread in the process perform the API calls while another watches for a "please die" message. Watchdogs can be tricky; if a watchdog is set for 15 seconds and an action would take 17 seconds to complete, one might request an action, time out after 15 seconds, retry the action, time out after 15 seconds, etc. indefinitely. It may be good to have the watchdog time adjust after each failure (e.g. try an action, letting it have up to 15 seconds; if that doesn't work, and nobody's complaining, try again and let it go 30 seconds; if that's still no good, give it 60 seconds.)

Right, if you have a way to cancel, you need to make sure that you limit the number of retries. — Steven Sudit, Sep 14 '10 at 18:24
However, "untrusted" process shutdown should be just fine. No point complicating it. — Steven Sudit, Sep 14 '10 at 18:24

How To Handle a Function That Doesn't Return

4 Answers4