I have a Windows desktop application, written in C# .NET, that needs to compile and run partially-trusted code. I handle this like a plugin: compile the code to a .DLL, then load the assembly and execute it in an AppDomain. The AppDomain is fairly restricted, having only Execution, Infrastructure, and RemotingConfiguration permissions, plus the ability to read files from the directory where the DLLs live.
While running in the debugger (Visual Studio Community 2017), the remote side occasionally fails to renew its lease. The remote "plugin manager" object then gets GCed, and the main app's next attempt to interact with a plugin fails with a RemotingException
.
The behavior is very inconsistent. Most of the time everything works fine, but some of the time it fails. It often fails before the lease is renewed for the first time, but sometimes it fails after a couple of renewals. The only clue I have right now is a message printed to the debugger console:
Exception thrown: 'System.Runtime.Remoting.RemotingException' in mscorlib.dll
Exception thrown: 'System.Runtime.Remoting.RemotingException' in mscorlib.dll
Some time after these appear, the "plugin manager" object gets GCed (I added a destructor that writes a message to the console). So it appears that the remote's attempt to renew the lease is failing, and the object is being collected.
Why is this happening? What can I do to prevent it?
I have log messages in the sponsor object that show the main app renewing the lease. They look like this:
15:17:20|Lease renewal for PluginCommon.PluginManager, last renewed 00:05:00.1248814 sec ago; renewing for 00:02:00 (host id=1)
15:19:20|Lease renewal for PluginCommon.PluginManager, last renewed 00:02:00.0281627 sec ago; renewing for 00:02:00 (host id=1)
That's the expected behavior when my app is idle -- initially 5 mins, renew every 2. The initial five-minute delay makes this somewhat frustrating to debug.
Note I can't override InitializeLifetimeService()
without making the security permissive, so I can't just make the object live forever (which would be an acceptable solution here). I've tried some experiments with permissive security and 10-second timeouts, but have yet to see it fail (which, given the sporadic nature of the failures, doesn't necessarily mean anything).
My Sponsor class is about what you'd expect:
class Sponsor<T> : MarshalByRefObject, ISponsor, IDisposable where T : MarshalByRefObject
The "plugin" AppDomain is created like this:
PluginManager pm = (PluginManager)mAppDomain.CreateInstanceAndUnwrap(
typeof(PluginManager).Assembly.FullName,
typeof(PluginManager).FullName);
// Wrap it so it doesn't disappear on us.
mPluginManager = new Sponsor<PluginManager>(pm);
The sponsored PluginManager object is the only link between the main AppDomain and the plugin AppDomain.
While typing this up, the app crashed, without ever creating a lease. I had added RemotingException to the exception break list, which yielded this stack trace:
mscorlib.dll!System.Runtime.Remoting.Channels.ChannelServices.CheckDisconnectedOrCreateWellKnownObject(System.Runtime.Remoting.Messaging.IMessage msg) Unknown
mscorlib.dll!System.Runtime.Remoting.Channels.ChannelServices.SyncDispatchMessage(System.Runtime.Remoting.Messaging.IMessage msg) Unknown
mscorlib.dll!System.Runtime.Remoting.Channels.CrossAppDomainSink.DoDispatch(byte[] reqStmBuff, System.Runtime.Remoting.Messaging.SmuggledMethodCallMessage smuggledMcm, out System.Runtime.Remoting.Messaging.SmuggledMethodReturnMessage smuggledMrm) Unknown
mscorlib.dll!System.Runtime.Remoting.Channels.CrossAppDomainSink.DoTransitionDispatchCallback(object[] args) Unknown
mscorlib.dll!System.Threading.Thread.CompleteCrossContextCallback(System.Threading.InternalCrossContextDelegate ftnToCall, object[] args) Unknown
[AppDomain (Plugin Domain, #2) -> AppDomain (SourceGen.exe, #1)]
mscorlib.dll!System.Runtime.Remoting.Channels.CrossAppDomainSink.DoTransitionDispatch(byte[] reqStmBuff, System.Runtime.Remoting.Messaging.SmuggledMethodCallMessage smuggledMcm, out System.Runtime.Remoting.Messaging.SmuggledMethodReturnMessage smuggledMrm) Unknown
mscorlib.dll!System.Runtime.Remoting.Channels.CrossAppDomainSink.SyncProcessMessage(System.Runtime.Remoting.Messaging.IMessage reqMsg) Unknown
mscorlib.dll!System.Runtime.Remoting.Channels.ADAsyncWorkItem.FinishAsyncWork(object stateIgnored) Unknown
mscorlib.dll!System.Threading.QueueUserWorkItemCallback.WaitCallback_Context(object state) Unknown
mscorlib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Unknown
mscorlib.dll!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state, bool preserveSyncCtx) Unknown
mscorlib.dll!System.Threading.QueueUserWorkItemCallback.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem() Unknown
mscorlib.dll!System.Threading.ThreadPoolWorkQueue.Dispatch() Unknown
mscorlib.dll!System.Threading._ThreadPoolWaitCallback.PerformWaitCallback() Unknown
[Native to Managed Transition]
I'm not sure what workarounds are available -- maybe having a System.Timers.Timer
ping it every 60 seconds would keep things alive even if the plugin domain is unable to contact the main app?
There are similar questions on SO (e.g. this), but they describe consistently reproducible behavior.
Update: FWIW, the 60-second ping kluge seems to be working. The renew-on-call mechanism keeps the object warmed up, so it never has to request a lease renewal.
Update (1.5 years later): it gets into the bad state reproducibly if Windows suspends for 5+ minutes while the program is running. Apparently when Windows wakes up, the plugin AppDomain checks the time and determines that everything has expired, and doesn't bother querying the lease objects. I even see the two RemotingException log messages.
One curious aspect was revealed by a modification to the keep-alive ping code, which has the plugin side return how long it has been since it last saw a ping. In a separate experiment, I used the "break all" button in Visual Studio to pause the program for a few minutes. After resuming, the VS output window showed this:
PluginManager Ping tid=12 (id=2): 1000
KeepAlive tid=12 result=361
Exception thrown: 'System.Runtime.Remoting.RemotingException' in mscorlib.dll
Exception thrown: 'System.Runtime.Remoting.RemotingException' in mscorlib.dll
Reanalyzing...
Refreshing project (CodeAndData)
Exception thrown: 'System.Runtime.Remoting.RemotingException' in SourceGen.exe
So the app side woke up and sent a ping to the plugin side, which reported that it hadn't been pinged in 361 seconds. After that successful transaction, the exception messages appeared. A couple seconds later when I hit the "reanalyze" button, it crashed because the object on the plugin side had gone away.
A successful call to the plugin side failed to reset the timer. The existence of a lease object failed to keep the object alive. I think I need to shift from preventing the failure to smoothly recovering from it.
FWIW, the program has shipped and is open source.