31

I've implemented healthchecks in my asp.net core application. One healthcheck does 2 checks - DbContext connection and custom one that checks NpgsqlConnection.

Everything works fine in over 99% of cases. Occasionally healthcheck fails throwing TaskCanceledException or OperationCanceledException. From my logs I can see that this exceptions are thrown after around 2ms-25ms (so there is no chance any timeout happened).

Important hint:

When I hit healtchecks many times (simple F5 in browser) it throws the exception. Looks like you can't hit /health endpoint before previous healthcheck is completed. If this is the case - why? Even if I put Thread.Sleep(5000); in custom healthcheck (an no DB connection check at all) it will fail if I hit /health endpoint before 5 seconds passes.

QUESTION: Is healtheck somehow 'magically' single-threaded (when you hit that endpoint again, it cancels previous healthcheck invocation)?

Startup.cs ConfigureServices

services
    .AddHealthChecks()
    .AddCheck<StorageHealthCheck>("ReadOnly Persistance")
    .AddDbContextCheck<MyDbContext>("EFCore persistance");

Startup.cs Configure

if (env.IsDevelopment())
{
    app.UseDeveloperExceptionPage();
}
else
{
    app.UseHsts();
}

app.UseHttpsRedirection();
app.UseCors(options => options.AllowAnyOrigin().AllowAnyMethod().AllowAnyHeader());

app.UseMiddleware<RequestLogMiddleware>();
app.UseMiddleware<ErrorLoggingMiddleware>();

if (!env.IsProduction())
{
    app.UseSwagger();

    app.UseSwaggerUI(c =>
    {
        c.SwaggerEndpoint("/swagger/v1/swagger.json", "V1");
        c.SwaggerEndpoint($"/swagger/v2/swagger.json", $"V2");
    });
}

app.UseHealthChecks("/health", new HealthCheckOptions()
{
    ResponseWriter = WriteResponse
});

app.UseMvc();

StorageHealthCheck.cs

public class StorageHealthCheck : IHealthCheck
    {
        private readonly IMediator _mediator;

        public StorageHealthCheck(IMediator mediator)
        {
            _mediator = mediator;
        }

        public async Task<HealthCheckResult> CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken = default(CancellationToken))
        {
            var isReadOnlyHealthy = await _mediator.Send(new CheckReadOnlyPersistanceHealthQuery());

            return new HealthCheckResult(isReadOnlyHealthy ? HealthStatus.Healthy : HealthStatus.Unhealthy, null);
        }
    }

CheckReadOnlyPersistanceHealthQueryHandler:

NpgsqlConnectionStringBuilder csb = new NpgsqlConnectionStringBuilder(_connectionString.Value);

string sql = $@"
    SELECT * FROM pg_database WHERE datname = '{csb.Database}'";

try
{
    using (IDbConnection connection = new NpgsqlConnection(_connectionString.Value))
    {
        connection.Open();

        var stateAfterOpening = connection.State;
        if (stateAfterOpening != ConnectionState.Open)
        {
            return false;
        }

        connection.Close();
        return true;
    }
}
catch
{
    return false;
}

TaskCanceledException:

System.Threading.Tasks.TaskCanceledException: A task was canceled.
   at Npgsql.TaskExtensions.WithCancellation[T](Task`1 task, CancellationToken cancellationToken)
   at Npgsql.NpgsqlConnector.ConnectAsync(NpgsqlTimeout timeout, CancellationToken cancellationToken)
   at Npgsql.NpgsqlConnector.RawOpen(NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
   at Npgsql.NpgsqlConnector.Open(NpgsqlTimeout timeout, Boolean async, CancellationToken cancellationToken)
   at Npgsql.NpgsqlConnection.<>c__DisplayClass32_0.<<Open>g__OpenLong|0>d.MoveNext()
--- End of stack trace from previous location where exception was thrown ---
   at Npgsql.EntityFrameworkCore.PostgreSQL.Storage.Internal.NpgsqlDatabaseCreator.ExistsAsync(CancellationToken cancellationToken)
   at Microsoft.Extensions.Diagnostics.HealthChecks.DbContextHealthCheck`1.CheckHealthAsync(HealthCheckContext context, CancellationToken cancellationToken)
   at Microsoft.Extensions.Diagnostics.HealthChecks.DefaultHealthCheckService.CheckHealthAsync(Func`2 predicate, CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Diagnostics.HealthChecks.HealthCheckMiddleware.InvokeAsync(HttpContext httpContext)
   at Microsoft.AspNetCore.Builder.Extensions.MapWhenMiddleware.Invoke(HttpContext context)

OperationCanceledException:

System.OperationCanceledException: The operation was canceled.
   at System.Threading.CancellationToken.ThrowOperationCanceledException()
   at Microsoft.Extensions.Diagnostics.HealthChecks.DefaultHealthCheckService.CheckHealthAsync(Func`2 predicate, CancellationToken cancellationToken)
   at Microsoft.AspNetCore.Diagnostics.HealthChecks.HealthCheckMiddleware.InvokeAsync(HttpContext httpContext)
   at Microsoft.AspNetCore.Builder.Extensions.MapWhenMiddleware.Invoke(HttpContext context)
Maciej Pszczolinski
  • 1,623
  • 1
  • 19
  • 38
  • It's difficult to match the exception stack traces with the code you posted. Are you sure your code is ignoring the `CancellationToken`? – Stephen Cleary Mar 03 '20 at 14:46
  • 8
    Experiencing the same issue. – Ben Mar 03 '20 at 20:17
  • Stephen Cleary - what do you mean by ignoring CancellationToken? I'm never using it, nor doing any Task.Cancel by myself. But maybe there is some setting to ignore it that I'm not aware of? – Maciej Pszczolinski Mar 04 '20 at 08:55
  • I am having same problem even with a very simple HealthChecker that has no IO operation like this https://dotnetfiddle.net/gmSSIn – Ming Mar 20 '20 at 11:19
  • Hey, @MaciejPszczolinski did you figure out what was the cause? – Ming Mar 23 '20 at 09:16
  • 2
    Yes, I think the cause is HttpContext.RequestAborted. In source code of healtcheck I see that "HttpContext.RequestAborted" is used as cancellation token (line 59, https://github.com/aspnet/Diagnostics/blob/master/src/Microsoft.AspNetCore.Diagnostics.HealthChecks/HealthCheckMiddleware.cs) , but I have no idea how to handle it properly. – Maciej Pszczolinski Mar 24 '20 at 08:47
  • 1
    Did you ever figure this out? – jjxtra May 20 '20 at 21:37
  • jjxtra - no, I didn't. I'm still getting these errors. – Maciej Pszczolinski May 21 '20 at 19:14

3 Answers3

25

I've finally found the answer.

The initial reason is that when the HTTP request is aborted, then httpContext.RequestAborted CancellationToken is triggered, and it throws an exception (OperationCanceledException).

I have a global exception handler in my application, and I have been converting every unhandled exception to a 500 error. Even though the client aborted the request, and never got the 500 response, my logs kept logging this.

The solution I implemented is like that:

public async Task Invoke(HttpContext context)
{
    try
    {
        await _next(context);
    }
    catch (Exception ex)
    {
        if (context.RequestAborted.IsCancellationRequested)
        {
            _logger.LogWarning(ex, "RequestAborted. " + ex.Message);
            return;
        }

        _logger.LogCritical(ex, ex.Message);
        await HandleExceptionAsync(context, ex);
        throw;
    }
}

private static Task HandleExceptionAsync(HttpContext context, Exception ex)
{
    var code = HttpStatusCode.InternalServerError; // 500 if unexpected

    //if (ex is MyNotFoundException) code = HttpStatusCode.NotFound;
    //else if (ex is MyUnauthorizedException) code = HttpStatusCode.Unauthorized;
    //else if (ex is MyException) code = HttpStatusCode.BadRequest;

    var result = JsonConvert.SerializeObject(new { error = ex.Message });
    context.Response.ContentType = "application/json";
    context.Response.StatusCode = (int)code;
    return context.Response.WriteAsync(result);
}

hope it helps to somebody.

n-develop
  • 589
  • 7
  • 13
Maciej Pszczolinski
  • 1,623
  • 1
  • 19
  • 38
0

My best theory, after testing in a large production environment, is that you need to await any writers to the http context output stream in the health check. I was getting this error in a method where I returned a task that was not awaited. Awaiting the task appears to have solved the problem. The nice thing about await is that you could also catch a TaskCancelledException and just eat it.

Example:


// map health checks
endpoints.MapHealthChecks("/health-check", new HealthCheckOptions
{
    ResponseWriter = HealthCheckExtensions.WriteJsonResponseAsync,
    Predicate = check => check.Name == "default"
});

/// <summary>
/// Write a json health check response
/// </summary>
/// <param name="context">Http context</param>
/// <param name="report">Report</param>
/// <returns>Task</returns>
public static async Task WriteJsonResponseAsync(HttpContext context, HealthReport report)
{
    try
    {
        HealthReportEntry entry = report.Entries.Values.FirstOrDefault();
        context.Response.ContentType = "application/json; charset=utf-8";
        await JsonSerializer.SerializeAsync(context.Response.Body, entry.Data,entry.Data.GetType());
    }
    catch (TaskCancelledException)
    {
    }
}

jjxtra
  • 20,415
  • 16
  • 100
  • 140
0

If you happen to be using the Serilog RequestLoggingMiddleware, the following will allow you to stop aborted healthcheck requests being logged as errors:

app.UseSerilogRequestLogging(options =>
{
    options.GetLevel = (ctx, _, ex) =>
    {
        if (ex == null && ctx.Response.StatusCode <= 499)
        {
            return LogEventLevel.Information;
        }

        if (ctx.Request.Path.StartsWithSegments("/healthcheck"))
        {
            // If the incoming HTTP request for a healthcheck is aborted, don't log the resultant OperationCanceledException
            // as an error. Note that the ASP.NET DefaultHealthCheckService ensures that if the exception occurs
            // within the healthcheck implementation (and the request wasn't aborted) a failed healthcheck is logged
            // see https://github.com/dotnet/aspnetcore/blob/ce9e1ae5500c3f0c4b9bd682fd464b3493e48e61/src/HealthChecks/HealthChecks/src/DefaultHealthCheckService.cs#L121
            if (ex is OperationCanceledException)
            {
                return LogEventLevel.Information;
            }
        }

        return LogEventLevel.Error;
    };
});
eddiewould
  • 1,555
  • 16
  • 36