I'm getting this System.InvalidCastException
a few times a day in a .net 5 ASP.NET web app, but it's not reproduceable on demand. When this error occurs it often seems to happen several times in quick succession and then crashes my ASP.NET core app. IIS restarts the app and everything works fine for hours or days until it happens again.
The app serves > 150k requests/day, so this problem is relatively infrequent, happening very infrequently, e.g. once every 50-200k requests. There's no obvious pattern to the requests that cause this, although I'm continuing to check that.
Example stack trace:
An unhandled exception has occurred while executing the request.
System.InvalidCastException: Unable to cast object of type 'System.Data.SqlClient.SqlTransaction' to type 'System.Transactions.BucketSet'.
at System.Transactions.TransactionTable.AddIter(InternalTransaction txNew)
at System.Transactions.TransactionTable.Add(InternalTransaction txNew)
at System.Transactions.InternalTransaction..ctor(TimeSpan timeout, CommittableTransaction committableTransaction)
at System.Transactions.CommittableTransaction..ctor(IsolationLevel isoLevel, TimeSpan timeout)
at System.Transactions.TransactionScope..ctor(TransactionScopeOption scopeOption, TransactionOptions transactionOptions, TransactionScopeAsyncFlowOption asyncFlowOption)
at MyApp.Shared.TransactionHandling.GetTransactionScope(Int32 timeOutMinutes) in C:\TeamCity\buildAgent\work\fd104265242f4376\server\MyApp.Shared\TransactionHandling.cs:line 34
at MyApp.Shared.TransactionHandling.Run[T](Func`1 action) in C:\TeamCity\buildAgent\work\fd104265242f4376\server\MyApp.Shared\TransactionHandling.cs:line 85
at MyApp.BL.Access.AccessGroupService.Create(Int32 planId, List`1 accessGroups) in C:\TeamCity\buildAgent\work\fd104265242f4376\server\MyApp.BL\Access\AccessGroupService.cs:line 241
at MyApp.SL.Administration.Controllers.AccessGroupsController.PostAccessGroupsAsync(Int32 planId) in C:\TeamCity\buildAgent\work\fd104265242f4376\server\MyApp.SL.Api\Controllers\Administration\AccessGroupsController.cs:line 43
at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.TaskOfIActionResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeActionMethodAsync>g__Logged|12_1(ControllerActionInvoker invoker)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeNextActionFilterAsync>g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeInnerFilterAsync>g__Awaited|13_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeFilterPipelineAsync>g__Awaited|19_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Logged|17_1(ResourceInvoker invoker)
at Microsoft.AspNetCore.Routing.EndpointMiddleware.<Invoke>g__AwaitRequestTask|6_0(Endpoint endpoint, Task requestTask, ILogger logger)
at Microsoft.AspNetCore.Authorization.Policy.AuthorizationMiddlewareResultHandler.HandleAsync(RequestDelegate next, HttpContext context, AuthorizationPolicy policy, PolicyAuthorizationResult authorizeResult)
at Microsoft.AspNetCore.Authorization.AuthorizationMiddleware.Invoke(HttpContext context)
at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)
at MyApp.SL.Api.StoreTenantForLoggingMiddleware.InvokeAsync(HttpContext httpContext, TenantContext tenantContext) in C:\TeamCity\buildAgent\work\fd104265242f4376\server\MyApp.SL.Api\StoreTenantForLoggingMiddleware.cs:line 43
at Microsoft.AspNetCore.Builder.Extensions.MapMiddleware.Invoke(HttpContext context)
at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware.<Invoke>g__Awaited|6_0(ExceptionHandlerMiddleware middleware, HttpContext context, Task task)
Our code looks like this:
public class MyApp.BL.Access.AccessGroupService
{
// ... lots more code ...
public IEnumerable<ExpandoObject> Create(int planId, List<ExpandoObject> accessGroups)
=> TransactionHandling.Run(() =>
{
// ... do lots of stuff that calls EF Core ...
}
}
public static class TransactionHandling
{
private static int _maxTimeoutMinutes = 10; // System default.
public static T Run<T>(Func<T> action)
{
using (var scope = GetTransactionScope())
{
var result = action();
scope.Complete();
return result;
}
}
public static TransactionScope GetTransactionScope(int timeOutMinutes = 0)
{
var transactionOptions = new TransactionOptions
{
IsolationLevel = IsolationLevel.ReadCommitted
};
// Odd hack here but I don't think this is related as timeOutMinutes
// is 0 in most or all cases where this exception happens.
if (timeOutMinutes > 0)
{
// See https://github.com/dotnet/runtime/issues/1418 for info on this hack
// We're setting global max so only do it if it's bigger than we've had before.
transactionOptions.Timeout = TimeSpan.FromMinutes(timeOutMinutes);
if (timeOutMinutes > _maxTimeoutMinutes)
{
_maxTimeoutMinutes = timeOutMinutes;
SetTransactionManagerField("s_cachedMaxTimeout", true);
SetTransactionManagerField("s_maximumTimeout", transactionOptions.Timeout);
}
}
return new TransactionScope(TransactionScopeOption.Required, transactionOptions, TransactionScopeAsyncFlowOption.Enabled);
}
private static void SetTransactionManagerField(string fieldName, object value)
{
typeof(TransactionManager).GetField(fieldName, BindingFlags.NonPublic | BindingFlags.Static).SetValue(null, value);
}
// ...
}
I've looked at the TransactionTable code and can't see any obvious reason this could happen. My initial guess is that we're doing something naughty with SqlClient objects, using them from the wrong thread somehow. The codebase is quite large and we use a mix of EF Core plus ADO.NET directly, but nevertheless I can't find anything we're doing wrong.
What could cause this exception?
Another problem we have much less frequently, but perhaps related, is an exception like System.InvalidOperationException: Internal .NET Framework Data Provider error 3.
. The time where I found this had a stacktrace from Hangfire, but I suspect it's more likely something we're doing wrong that's causing this than a bug in Hangfire. e.g. perhaps multithreaded access to an ADO object could cause this. I notice that Hangfire is using System.Data.SqlClient and we're otherwise using Microsoft.Data.SqlClient so I'll reconfigure Hangfire to use the latter and check if that helps.
UPDATE 1:
I came across this answer where cloning ExpandoObject caused problems with WeakReference. We are indeed using a deep clone based on this code within at least one of the methods that's getting this System.InvalidCastException
, making me think this is a strong possibility for the problem. I'll report back when I find out more.