I am working in a C# console application with a 3rd party library and awaiting a Task that the 3rd party library is running. I've used Async await "all the way down". The full details of the issue I'm having are here but that's quite a long post so I'd like to simplify the problem here and try to abstract it from the details.
Essentially (very simplified) the code is the following:
public static async Task<byte[]> CaptureRawData()
{
await Camera.Capture();
return Camera.CurrentStream.ToArray();
}
This is capturing images on my Raspberry pi and returning the image as a byte array in memory. It captures an image around every 10 seconds or so. After running for several hours, the Camera.Capture(); line randomly hangs indefinitely.
I've read that this might be caused due to intermittent power brownouts, but regardless, I just want to be able to detect the hang and try again. It only happens once every few hours, so I don't really mind if I miss one image, I just want to be able to carry on and retry without freezing the main thread indefinitely.
I was inspired by this other SO question to try to run the task with a timeout so that I can just try again if it times out.
I adapted one of the answers provided to give the following:
public static async Task<bool> CancelAfterAsync(Task startTask, TimeSpan timeout)
{
using (var timeoutCancellation = new CancellationTokenSource())
{
var delayTask = Task.Delay(timeout, timeoutCancellation.Token);
Serilog.Log.Logger.Debug("await Task.WhenAny");
var completedTask = await Task.WhenAny(startTask, delayTask);
Serilog.Log.Logger.Debug("Finished await Task.WhenAny");
// Cancel timeout to stop either task:
// - Either the original task completed, so we need to cancel the delay task.
// - Or the timeout expired, so we need to cancel the original task.
// Canceling will not affect a task, that is already completed.
timeoutCancellation.Cancel();
if (completedTask == startTask)
{
// original task completed
Serilog.Log.Logger.Debug(" await startTask;");
await startTask;
return true;
}
else
{
Serilog.Log.Logger.Debug("Timed out");
// timeout
return false;
}
}
}
public static async Task<byte[]> CaptureRawData()
{
Serilog.Log.Logger.Debug("Running task");
if (await TaskUtils.CancelAfterAsync(Camera.Capture(), TimeSpan.FromSeconds(100)))
{
Serilog.Log.Logger.Debug("Got response, returning data");
return Camera.CurrentStream.ToArray();
}
else
{
Serilog.Log.Logger.Warning("Camera timed out, return null");
return null;
}
}
public static async Task<byte[]> CaptureImage()
{
byte[] data = null;
for (var i = 0; i < 10; i++)
{
data = await CaptureRawData();
if (data != null)
{
break;
}
else
{
Serilog.Log.Logger.Warning("Camera timed out, retrying");
}
}
if (data == null || data.Length == 0)
{
//Todo: Better exception message
throw new Exception("Image capture failed");
}
return data;
}
Now, upon hanging, it should detect the hang and retry up to 10 times. But instead I get the following logging output:
[13:54:54 DBG] Running Task
[13:54:54 DBG] await Task.WhenAny
[13:56:34 DBG] Finished await Task.WhenAny
[13:56:34 DBG] Timed out
[13:56:34 WRN] Camera timed out, return null
It then hangs indefinitely on the "return null" line, it should log "Camera timed out, retrying" straight after this line, but it never does, just hangs forever on "return null".
This makes no sense, because the CancelAfterAsync method has clearly detected the hang and returned false, but it's the parent method which then hangs.
How can I just detect the hang and retry safely?
As explained before, it only happens rarely, once every few hours after calling this method hundreds of times, so I just want to be able to detect that it's happened and retry without locking everything up.
EDIT: As suggested in the comments, I tried running the rogue task inside a Task.Run, and removed all async from my program, like the following:
public static class MemoryCapture
{
private static volatile bool _camProcessing = false;
public static byte[] CaptureRawData()
{
MMALCamera cam = MMALCamera.Instance;
MMALCameraConfig.Debug = true;
MMALCameraConfig.StillEncoding = MMALEncoding.BGR24;
MMALCameraConfig.StillSubFormat = MMALEncoding.BGR24;
using (var imgCaptureHandler = new MemoryStreamCaptureHandler())
using (var renderer = new MMALNullSinkComponent())
{
cam.ConfigureCameraSettings(imgCaptureHandler);
cam.Camera.PreviewPort.ConnectTo(renderer);
// Camera warm up time
Thread.Sleep(2000);
if (WaitForCam(cam))
{
var result = imgCaptureHandler.CurrentStream.ToArray();
return result;
}
else
{
Serilog.Log.Logger.Warning($"Reached timeout, returning null...");
return null;
}
}
}
private static bool WaitForCam(MMALCamera cam)
{
_camProcessing = true;
Serilog.Log.Logger.Debug("Running cam process task");
Task.Run(() =>
{
Serilog.Log.Logger.Debug($"cam.ProcessAsync");
cam.ProcessAsync(cam.Camera.StillPort).ConfigureAwait(false).GetAwaiter().GetResult();
Serilog.Log.Logger.Debug($"cam.ProcessAsync finished");
_camProcessing = false;
});
for (var i = 0; i < 1000; i++)
{
Thread.Sleep(100);
if (!_camProcessing)
{
Serilog.Log.Logger.Debug($"cam processing finished");
return true;
}
}
Serilog.Log.Logger.Warning($"Reached timeout, camera might have locked up");
return false;
}
public static byte[] CaptureImageHelper()
{
byte[] data = null;
for (var i = 0; i < 10; i++)
{
data = CaptureRawData();
if (data != null)
{
break;
}
Serilog.Log.Logger.Warning($"Retrying...");
}
if (data == null)
{
throw new Exception("Image capture failed");
}
return data;
}
}
}
The log output from that code is the following:
[23:02:29 DBG] Running cam process task
[23:02:29 WRN] cam.ProcessAsync
[23:04:09 WRN] Reached timeout, camera might have locked up
[23:04:09 WRN] Reached timeout, returning null...
It then hangs forever.
It's hanging when returning from CaptureRawData, so it's possible that it's hanging while disposing one of the usings.