filesystemwatcher stops responding on Linux

Question

I have a filesystemwatcher (C# .net Core 2.2) that runs reliably on Windows but the same code stops working after several hours when running on Linux. I'm monitoring a local directory (and subdirectories). The monitored directory is on the same Linux server where my application resides (not a network share).

I created a test application that creates test files every X seconds and I monitor for events. It will run for 3-5 hours and then the events just stop firing without errors.

I have configured the test application to run every 5 seconds and allowed it to run for 30 minutes without issue. I have then increased the delay interval to 30 seconds, 10 minutes, 30 minutes and everything seems to work fine. But when I increase the interval to one file per hour I start seeing problems after 3 - 5 hours.

I tried increasing max_user_watchers to 100000 and max_user_instances to 256 but I don't really think that is the problem. Still, it didn't help.

public bool CreateWatcher()
{
    bool isStarted = false;

    try
    {
        // Set criteria for files to watch
        watcher.Filter = watcherFilter;

        // folder(s) to watch
        watcher.Path = watcherPath;

        // Watch in subfolders
        watcher.IncludeSubdirectories = includeSubdirectories;

        // Add event handlers
        switch (watcherEventType)
        {
            case EventType.Created:
                watcher.Created += new FileSystemEventHandler(OnCreated);
                break;
            case EventType.Deleted:
                watcher.Deleted += new FileSystemEventHandler(OnDeleted);
                break;
            case EventType.Changed:
                watcher.Changed += new FileSystemEventHandler(OnChanged);
                break;
            default:
                break;
        }

        // Set error event handler
        watcher.Error += new ErrorEventHandler(OnError);

        // Start watching
        watcher.EnableRaisingEvents = true;
        isStarted = true;

        _Log.Information("Watching path {watcherPath} for new files of type {watcherFilter}", watcherPath, watcherFilter);

    }
    catch (Exception ex)
    {
        _Log.Error("CreateWatcher: {@ex}", ex);
        if (watcher != null)
        {
            watcher.EnableRaisingEvents = false;
            watcher.Dispose();
        }
    }
    return isStarted;
}

public void StartWatcher()
{
    try
    {
        // Ensure watched path exists
        if (!Directory.Exists(watcherPath))
        {
            _Log.Warning("Watcher path {watcherPath} does not exist.  Attempting to create.", watcherPath);

            try
            {
                // Attempt to create watcherPath
                Directory.CreateDirectory(watcherPath);
                _Log.Information("Created watcherPath: {watcherpath}", watcherPath);
            }
            catch (Exception ex)
            {
                _Log.Error("StartWatcher : Error creating watcherPath: {@ex}", ex);
            }
        }

        CreateWatcher();
    }
    catch (Exception ex)
    {
        _Log.Error("StartWatcher : {@ex}", ex);
    }
}


private void OnError(object source, ErrorEventArgs e)
{
    try
    {
        // If Buffer overflow
        if (e.GetException().GetType() == typeof(InternalBufferOverflowException))
        {
            // Too many events -- some of the file system events are being lost.
            // Nothing we can do about it so log the error and continue...
            _Log.Error("OnError : {@exception}", e.GetException());
            return;
        }

        //-----------------------------------------------------
        // If not buffer overflow must be a folder access issue
        //-----------------------------------------------------
        _Log.Error("OnError Error: Watched directory not accessible");

        // Stop raising events
        watcher.EnableRaisingEvents = false;

        // Dispose of current watcher
        watcher.Dispose();

        int counter = 0;
        Boolean isStarted = false;

        // continue to loop if path does not exist or watcher cannot start 
        while (isStarted == false && counter < retryAttempts)
        {
            try
            {
                // If watcher path exists
                if (Directory.Exists(watcherPath))
                {
                    _Log.Information("Folder Access Restored: {watcherPath}", watcherPath);

                    // Attempt to recreate watcher
                    isStarted = CreateWatcher();
                }

                // Path does not exist
                else
                {
                    _Log.Error("OnError Folder Inaccesible {watcherPath}", watcherPath);
                }
            }
            catch (Exception ex)
            {
                _Log.Error("OnError Error restarting filesystemwatcher {@ex}", ex);
            }

            // wait a little and try again
            System.Threading.Thread.Sleep(retryTimeOut);
            counter++;
        }
        // If not restarted, kill service
        if (!isStarted)
        {
            _Log.Error("OnError Exceeded the maximum number of attempts to reconnect to watcher path.  Resolve path issue and restart Relay");
            return;
        }
    }
    catch (Exception ex)
    {
        _Log.Error("OnError Error attempting to recover file system watcher: {@ex}", ex);
    }
}

Given that the tester works for large numbers of files when run every 5 seconds, but fails with just a few files when spaced with one file every hour, I don't think this is a resource issue.

When I auto-create the test files I simply increment the file name (example: 1.xml, 2.xml, 3.xml) and I see the files being created on disk. When the process is working, in the logs, I see the command to create the file, then immediately after I see a watcher event fire. However, after several hours, I see the command to create the file, but no entry for the watcher event.

I did allow the tester program run for an entire day and through the night at 30-minute intervals and it worked fine. I'm left to wonder if something is timing out when the intervals are longer. But if that were the case, why would it create 5 files (one per hour) and then stop?

Eventually, I could have a million files to monitor, so polling is not an option.

what filesystem are you using on your Linux machine? And what kernel are you on? — Steffen Winkler, Aug 26 '19 at 09:05
Also look at [this](https://stackoverflow.com/questions/22767413/filesystemwatcher-events-raising-twice-despite-taking-measures-against-it/22768610#22768610) and [the answers under this question](https://stackoverflow.com/questions/239988/filesystemwatcher-vs-polling-to-watch-for-file-changes). In short: FSW may work on NTFS partitions under Windows with small amounts of file changes. In a directory with many changes it will almost definetly fail for one reason or the other. Polling is required one way or the other — Steffen Winkler, Aug 26 '19 at 09:10
The version of Linux is Debian 9 Stretch. Whatever the default filesystem is... — Matthew, Aug 26 '19 at 18:04
that would probably be ext4. You could confirm with `lsblk -f` which will list all blockdevices (partitions) with their filesystems (`-f`). As for figuring out your original question: In .net core the FSW on Linux utilizes `inotify`, which can be directly interacted with as well. To figure out which system (.net core's implementation or inotify itself) breaks down. Install `inotify-tools` and run this command in a shell and see if/when it breaks down/stops working: `inotifywait -r -m /dir/to/monitor/` — Steffen Winkler, Aug 27 '19 at 07:40

filesystemwatcher stops responding on Linux

0 Answers0