How to end a function that hangs sometimes?

Question

I have written a function f like this:

void f()
{
   f1();   // Pre-Work
   f2();   // the actual work 
   f3();   // Post-Work
}

f1 and f3 are my own functions, but f2 comes from a library I have no control of. f is called periodically from some controller.

All works fine, but once a week f2 hangs, and does not return for unknown reasons. What is the design pattern to handle this? Probably, f should be run in an extra Task, or can I use async/await pattern? What is with memory that might be allocated in f2?

You mention tasks. Are you using async/await? Could it be the way you're calling it is different on the occasion that it deadlocks? — ProgrammingLlama, Feb 11 '22 at 08:38
it is a simple console application, currently working completely synchronous. Its main purpose is to check for email every 30 seconds. f2 actually is functionality from OpenPop-library (http://hpop.sourceforge.net/documentation/). I create the Pop3Client new for every call. IMHO the easiest way really is to put f2 in a separate process, implement a named pipe for communication, call it asynchronously and measure the time it takes to respond. If too long, hard kill the process ...... — ProgrammingRookie, Feb 11 '22 at 15:19

JonasH · Accepted Answer · 2022-02-11T10:17:11.223

7

if f2 actually hangs and never returns, there are a few things that can be done.

Start by ensuring that the method is used correctly, i.e. that any parameters are always correct, and there is no synchronization issue going on, i.e. ensure the library is used in a thread safe way.

If that does not help, contact the authors of the library. It might help if you can reproduce the issue, so writing a minimal reproducible example might help. Callstacks, memory dumps and other supporting information might also help. If the library is open source you might have the option of fixing the issue yourself.

If nothing else works, move the call to a separate process. This will let you close the process without cooperation from the function. There is some more details in What's wrong with using Thread.Abort(). This will add some complexity, but there libraries that can make this easier. A message queue like NetMq would probably work, you might also consider something like gRPC, there are ton of options to chose from.

edited Feb 11 '22 at 10:17

answered Feb 11 '22 at 08:49

JonasH

28,608
2
10
23

Thanks. f2 is called every 30 seconds, and fails once a week on average. It is impossible to nail that down to some sync issue, deadlock or whatever. The program itself is single threaded console app, but the machine runs other stuff, too. Of course I can spawn another process. But then I need to care for communication and synchronisation between the two. Serialize data, Semaphores, etc. The article you pointed to about the Thread.Abort has useful information. I think this will be the way to go, or do you have an idea how to do it with standard-Task-based programming? – ProgrammingRookie Feb 11 '22 at 09:45
@ProgrammingRookie tasked based programming uses threads, so only support cooperative cancellation. I'm not sure why you think failing once in a million could not be caused by synchronization issues, that is exactly the kind of problem I would expect from multi threading problems. But if your program is single threaded that reduces the chance of such problems. And yes, using a separate process is more complicated, but there are libraries that take care of at least some of the issues. – JonasH Feb 11 '22 at 10:06
It sure is caused by sync issues. f2 is NOT consuming CPU, so it is not caused by an endless loop. f2 is waiting on something that never occurs. . What I meant was that is is impossible to find what is causing it. I read about the Abort(), and see that it is deprecated. So I probably must put f2 in its own process. – ProgrammingRookie Feb 11 '22 at 10:24

How to end a function that hangs sometimes?

1 Answers1