10

I've noticed a strange behavior in one of our applications recently.

Exception=System.InvalidCastException: Unable to cast object of type 'System.Data.SqlClient.SqlTransaction' to type 'System.Byte[]'.
   at ServiceStack.Text.Pools.BufferPool.GetCachedBuffer(Int32 minSize) in C:\BuildAgent\work\912418dcce86a188\src\ServiceStack.Text\Pools\BufferPool.cs:line 55
   at ServiceStack.Redis.RedisNativeClient..ctor(RedisEndpoint config) in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisNativeClient_Utils.cs:line 447
   at ServiceStack.Redis.RedisClient..ctor(RedisEndpoint config) in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisClient.cs:line 66
   at ServiceStack.Redis.RedisConfig.<>c.<.cctor>b__35_0(RedisEndpoint c) in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisConfig.cs:line 22
   at ServiceStack.Redis.RedisResolver.CreateRedisClient(RedisEndpoint config, Boolean master) in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisResolver.cs:line 76
   at ServiceStack.Redis.RedisManagerPool.GetClient() in C:\BuildAgent\work\b2a0bfe2b1c9a118\src\ServiceStack.Redis\RedisManagerPool.cs:line 214

...

Or

Exception=System.InvalidCastException: Unable to cast object of type 'System.Byte[]' to type 'System.Transactions.SafeIUnknown'.
   at System.Transactions.Transaction.JitSafeGetContextTransaction(ContextData contextData)
   at System.Transactions.Transaction.FastGetTransaction(TransactionScope currentScope, ContextData contextData, Transaction& contextTransaction)
   at System.Transactions.Transaction.get_Current()
   at System.Data.ProviderBase.DbConnectionPool.GetFromTransactedPool(Transaction& transaction)
   at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at System.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource`1 retry)
   at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry)
   at System.Data.SqlClient.SqlConnection.Open()
   at ServiceStack.OrmLite.OrmLiteConnection.Open() in C:\BuildAgent\work\27e4cc16641be8c0\src\ServiceStack.OrmLite\OrmLiteConnection.cs:line 86
   ...

Exceptions similar to these two, are being thrown from various parts of our application. The only thing in common to all of those exceptions is the WeakReference object, but I can see no obvious cause for this exception. For example, the ServiceStack.Text code that throws this exception is:

    private class CachedBuffer
    {
        private readonly WeakReference _reference;

        public int Size { get; }

        public bool IsAlive => _reference.IsAlive;
        public byte[] Buffer => (byte[])_reference.Target;

        public CachedBuffer(byte[] buffer)
        {
            Size = buffer.Length;
            _reference = new WeakReference(buffer);
        }
    }

Exception is thrown from Buffer property getter, apparently _reference.Target points to SqlTransaction object and not byte[], but _reference is only initialized once in the constructor and can't be changed afterwards, so how could this exception be thrown?

Additionally none of this code has changed recently, so it makes no sense, that it would suddenly start throwing errors. I also can't see any way we could have caused this bug by some change in our code, or am I wrong about that?

Could this be a bug in .net clr and if so, how could I diagnose it? Our application uses .NET 4.8 framework and we've been seeing those bugs on Windows server 2012 and Windows server 2019 machines within our testing environments, but I have not been able to reproduce them locally on my development machine.

Community
  • 1
  • 1
Marko Žerajić
  • 201
  • 1
  • 7
  • 6
    I ... can't fault your logic; it really does look like something *seriously* screwy is happening in (presumably) the GC internals here; has this changed recently? (for reference, here's the relevant System.Data bits - similar `WeakReference` code, but nothing odd: https://referencesource.microsoft.com/#system.transactions/System/Transactions/Transaction.cs,136). Is this x86? x64? Sad note, though: .NET Framework is *largely* obsolete now - if there is a bug, I doubt it is going to get much love (and it may already be fixed in .NET 5+) – Marc Gravell Apr 22 '21 at 14:03
  • The application targets x64 architecture. We started noticing this error last Wednesday, soon after we deployed a new version of application to our testing environment. There were no new updates installed to our servers, pretty much the only thing that changed was our application and there were no major changes, just some minor bug fixes. – Marko Žerajić Apr 22 '21 at 14:31
  • 1
    any Windows Update history, perhaps? I'm ... honestly quite impressed here. – Marc Gravell Apr 22 '21 at 14:46
  • Do you have any unsafe code or reflection? – Charlieface Apr 22 '21 at 14:52
  • @Charlieface it is notable that `CachedBuffer` in this case - and the same for the ADO.NET `FastGetTransaction` example - is an internal implementation detail of a library; you'd have to work hard *even to get hold of* the instances to manipulate it with reflection/`unsafe`; don't get me wrong, it is absolutely *possible*, but: that wouldn't happen accidentally – Marc Gravell Apr 22 '21 at 14:58
  • @MarcGravell Yeah unlikely reflection is an issue (although it might be a screwy debugger/profiler), but badly written `unsafe` or Pinvoke native code could do anything – Charlieface Apr 22 '21 at 15:01
  • 2
    Your implementation does technically exhibit a race condition. You are supposed to check `.IsAlive` both before (to avoid an exception), and after `.Target` (to create a GC Root), but before casting. That being said, you shouldn't be seeing this behavior, ever - at worst you should be seeing a `NullReferenceException`. `Target` shouldn't be pointing to random objects on the heap. PInvoke gone awry, or an out of date .Net installation, is a top suspect here. – Jonathan Dickinson Apr 22 '21 at 16:39
  • @JonathanDickinson I don't believe you do need to check `IsAlive` before, `Target` will just return null. Once it's in a local variable you can null-check and cast it, nothing will happen as it is already a strong-reference. So `IsAlive` is not necessary at all, just use `Target`. See https://stackoverflow.com/a/40773827/14868997 – Charlieface Apr 22 '21 at 18:53
  • @Charlieface we do use reflection in some parts of the code, but nothing that could access any of the objects that are throwing these exceptions. That was something I suspected as well, but I couldn't find any cases on reflection being used on classes, that contain any of these objects. We only use reflection on simple dto classes. I don't think we have any Pinvoke calls in our own code, but it's used in several libraries, not to mention .net framework classes themselves. None of that changed in this latest patch though. – Marko Žerajić Apr 22 '21 at 19:33
  • @Marc Gravell last windows updates were applied two weeks before we started noticing these errors. – Marko Žerajić Apr 22 '21 at 19:34
  • Are your running x86, x64 or AnyCPU build? Debug or Release? Perhaps try other combinations. What about setting `gcServer=false` or true in the app.config – Charlieface Apr 22 '21 at 20:28

1 Answers1

7

I've managed to locate the problem, it was indeed our own code that caused this error and it was related to reflection. Long story short, one of our developers introduced a code, that invoked deep clone on ExpandoObject.Keys property. This property is not a simple collection of strings, but also contains a reference to entire ExpandoObject and deep inside ExpandoObject there are also some WeakReference fields. I still don't understand exactly what happens, but I guess that cloning those WeakReferences inside ExpandoObject somehow caused the bug we experienced.

Thanks for your help. I inspected that code several times, but completely missed the deep clone invocation, because that extension method had a very generic name.

Marko Žerajić
  • 201
  • 1
  • 7
  • 2
    This is very intriguing. I would be *very* interested in seeing the code that introduced this, because quite honestly: the result is terrifying. Although to be fair, the runtime folks would say "you used reflection to hack inside an object: any consequences are on you" – Marc Gravell Apr 22 '21 at 21:26
  • What deep cloning method was being used? Force.DeepCloner by any chance? – StayOnTarget Aug 08 '22 at 14:10
  • I'm having a very similar problem https://stackoverflow.com/q/74262314/8479 when calling TransactionScope constructor: `System.InvalidCastException: Unable to cast object of type 'System.Data.SqlClient.SqlTransaction' to type 'System.Transactions.BucketSet'`. We _do_ deep cloning on ExpandoObjects, so I think it could be the same. I'll report back if I solve it. – Rory Oct 31 '22 at 11:08