I have a filesystemwatcher (C# .net Core 2.2) that runs reliably on Windows but the same code stops working after several hours when running on Linux. I'm monitoring a local directory (and subdirectories). The monitored directory is on the same Linux server where my application resides (not a network share).
I created a test application that creates test files every X seconds and I monitor for events. It will run for 3-5 hours and then the events just stop firing without errors.
I have configured the test application to run every 5 seconds and allowed it to run for 30 minutes without issue. I have then increased the delay interval to 30 seconds, 10 minutes, 30 minutes and everything seems to work fine. But when I increase the interval to one file per hour I start seeing problems after 3 - 5 hours.
I tried increasing max_user_watchers to 100000 and max_user_instances to 256 but I don't really think that is the problem. Still, it didn't help.
public bool CreateWatcher()
{
bool isStarted = false;
try
{
// Set criteria for files to watch
watcher.Filter = watcherFilter;
// folder(s) to watch
watcher.Path = watcherPath;
// Watch in subfolders
watcher.IncludeSubdirectories = includeSubdirectories;
// Add event handlers
switch (watcherEventType)
{
case EventType.Created:
watcher.Created += new FileSystemEventHandler(OnCreated);
break;
case EventType.Deleted:
watcher.Deleted += new FileSystemEventHandler(OnDeleted);
break;
case EventType.Changed:
watcher.Changed += new FileSystemEventHandler(OnChanged);
break;
default:
break;
}
// Set error event handler
watcher.Error += new ErrorEventHandler(OnError);
// Start watching
watcher.EnableRaisingEvents = true;
isStarted = true;
_Log.Information("Watching path {watcherPath} for new files of type {watcherFilter}", watcherPath, watcherFilter);
}
catch (Exception ex)
{
_Log.Error("CreateWatcher: {@ex}", ex);
if (watcher != null)
{
watcher.EnableRaisingEvents = false;
watcher.Dispose();
}
}
return isStarted;
}
public void StartWatcher()
{
try
{
// Ensure watched path exists
if (!Directory.Exists(watcherPath))
{
_Log.Warning("Watcher path {watcherPath} does not exist. Attempting to create.", watcherPath);
try
{
// Attempt to create watcherPath
Directory.CreateDirectory(watcherPath);
_Log.Information("Created watcherPath: {watcherpath}", watcherPath);
}
catch (Exception ex)
{
_Log.Error("StartWatcher : Error creating watcherPath: {@ex}", ex);
}
}
CreateWatcher();
}
catch (Exception ex)
{
_Log.Error("StartWatcher : {@ex}", ex);
}
}
private void OnError(object source, ErrorEventArgs e)
{
try
{
// If Buffer overflow
if (e.GetException().GetType() == typeof(InternalBufferOverflowException))
{
// Too many events -- some of the file system events are being lost.
// Nothing we can do about it so log the error and continue...
_Log.Error("OnError : {@exception}", e.GetException());
return;
}
//-----------------------------------------------------
// If not buffer overflow must be a folder access issue
//-----------------------------------------------------
_Log.Error("OnError Error: Watched directory not accessible");
// Stop raising events
watcher.EnableRaisingEvents = false;
// Dispose of current watcher
watcher.Dispose();
int counter = 0;
Boolean isStarted = false;
// continue to loop if path does not exist or watcher cannot start
while (isStarted == false && counter < retryAttempts)
{
try
{
// If watcher path exists
if (Directory.Exists(watcherPath))
{
_Log.Information("Folder Access Restored: {watcherPath}", watcherPath);
// Attempt to recreate watcher
isStarted = CreateWatcher();
}
// Path does not exist
else
{
_Log.Error("OnError Folder Inaccesible {watcherPath}", watcherPath);
}
}
catch (Exception ex)
{
_Log.Error("OnError Error restarting filesystemwatcher {@ex}", ex);
}
// wait a little and try again
System.Threading.Thread.Sleep(retryTimeOut);
counter++;
}
// If not restarted, kill service
if (!isStarted)
{
_Log.Error("OnError Exceeded the maximum number of attempts to reconnect to watcher path. Resolve path issue and restart Relay");
return;
}
}
catch (Exception ex)
{
_Log.Error("OnError Error attempting to recover file system watcher: {@ex}", ex);
}
}
Given that the tester works for large numbers of files when run every 5 seconds, but fails with just a few files when spaced with one file every hour, I don't think this is a resource issue.
When I auto-create the test files I simply increment the file name (example: 1.xml, 2.xml, 3.xml) and I see the files being created on disk. When the process is working, in the logs, I see the command to create the file, then immediately after I see a watcher event fire. However, after several hours, I see the command to create the file, but no entry for the watcher event.
I did allow the tester program run for an entire day and through the night at 30-minute intervals and it worked fine. I'm left to wonder if something is timing out when the intervals are longer. But if that were the case, why would it create 5 files (one per hour) and then stop?
Eventually, I could have a million files to monitor, so polling is not an option.