12

I have recently had a need to Enumerate an entire file system looking for specific types of files for auditing purposes. This has caused me to run into several exceptions due to having limited permissions on the file system to be scanned. Among them, the most prevalent have been UnauthorizedAccessException and much to my chagrin, PathTooLongException.

These would not normally be an issue except that they invalidate the IEnumerable, preventing me from being able to complete the scan.

Matthew Brubaker
  • 3,097
  • 1
  • 21
  • 18
  • I just wanted to say I think the `FileSystemEnumerable` class as offered by Mr. Brubaker above is absolutely wonderful, and so I want to encourage further use/development of it. To that end, I posted it as a GitHub repo of one file only, along with a README.md. I linked back to this StackOverflow post. Please find the class posted at: http://github.com/astrohart/FileSystemEnumerable/ All are welcome to fork the repo/submit issues/create pull requests. Thanks everyone! Brian Hart – Dr. Brian Hart Jan 26 '19 at 15:59

2 Answers2

21

In order to solve this problem, I have created a replacement File System Enumerator. Although it may not be perfect, it performs fairly quickly and traps the two exceptions that I have run into. It will find any directories or files that match the search pattern passed to it.

// This code is public domain
using System;
using System.Collections;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using log4net;

public class FileSystemEnumerable : IEnumerable<FileSystemInfo>
{
    private ILog _logger = LogManager.GetLogger(typeof(FileSystemEnumerable));

    private readonly DirectoryInfo _root;
    private readonly IList<string> _patterns;
    private readonly SearchOption _option;

    public FileSystemEnumerable(DirectoryInfo root, string pattern, SearchOption option)
    {
        _root = root;
        _patterns = new List<string> { pattern };
        _option = option;
    }

    public FileSystemEnumerable(DirectoryInfo root, IList<string> patterns, SearchOption option)
    {
        _root = root;
        _patterns = patterns;
        _option = option;
    }

    public IEnumerator<FileSystemInfo> GetEnumerator()
    {
        if (_root == null || !_root.Exists) yield break;

        IEnumerable<FileSystemInfo> matches = new List<FileSystemInfo>();
        try
        {
            _logger.DebugFormat("Attempting to enumerate '{0}'", _root.FullName);
            foreach (var pattern in _patterns)
            {
                _logger.DebugFormat("Using pattern '{0}'", pattern);
                matches = matches.Concat(_root.EnumerateDirectories(pattern, SearchOption.TopDirectoryOnly))
                                 .Concat(_root.EnumerateFiles(pattern, SearchOption.TopDirectoryOnly));
            }
        }
        catch (UnauthorizedAccessException)
        {
            _logger.WarnFormat("Unable to access '{0}'. Skipping...", _root.FullName);
            yield break;
        }
        catch (PathTooLongException ptle)
        {
            _logger.Warn(string.Format(@"Could not process path '{0}\{1}'.", _root.Parent.FullName, _root.Name), ptle);
            yield break;
        } catch (System.IO.IOException e)
        {
            // "The symbolic link cannot be followed because its type is disabled."
            // "The specified network name is no longer available."
            _logger.Warn(string.Format(@"Could not process path (check SymlinkEvaluation rules)'{0}\{1}'.", _root.Parent.FullName, _root.Name), e);
            yield break;
        }


        _logger.DebugFormat("Returning all objects that match the pattern(s) '{0}'", string.Join(",", _patterns));
        foreach (var file in matches)
        {
            yield return file;
        }

        if (_option == SearchOption.AllDirectories)
        {
            _logger.DebugFormat("Enumerating all child directories.");
            foreach (var dir in _root.EnumerateDirectories("*", SearchOption.TopDirectoryOnly))
            {
                _logger.DebugFormat("Enumerating '{0}'", dir.FullName);
                var fileSystemInfos = new FileSystemEnumerable(dir, _patterns, _option);
                foreach (var match in fileSystemInfos)
                {
                    yield return match;
                }
            }
        }
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

The usage is fairly simple.

//This code is public domain
var root = new DirectoryInfo(@"c:\wherever");
var searchPattern = @"*.txt";
var searchOption = SearchOption.AllDirectories;
var enumerable = new FileSystemEnumerable(root, searchPattern, searchOption);

People are free to use it if they find it useful.

RJFalconer
  • 10,890
  • 5
  • 51
  • 66
Matthew Brubaker
  • 3,097
  • 1
  • 21
  • 18
  • "People are free to use it if they find it useful." : that is not quite so true. your exert in under CC BY SA 3, (creative commons) which makes it delicate to use as a matter of fact. You could explicitly state "public domain" (copyleft) or zlib license (weakest copyright license), in your code snippet in a comment. thanks. – v.oddou Jul 10 '15 at 06:54
  • 2
    I assert that the code in the answer above as of July 14, 2015 is public domain with all rights and privileges that grants. – Matthew Brubaker Jul 14 '15 at 17:32
  • 2
    I found I needed to catch `System.IO.IOException` for situations where I'm walking a network drive that has a remote-to-local simlink directory with such simlink expansion rules [disabled on current machine](http://blogs.msdn.com/b/junfeng/archive/2012/05/07/the-symbolic-link-cannot-be-followed-because-its-type-is-disabled.aspx). I tweaked your answer accordingly. It also recurses infinitely if it encounters a simlink that points to an ancestor folder; in my case I just ignored dirs with attributes of `FileAttributes.ReparsePoint` but this is probably not elegant enough for a general answer – RJFalconer Sep 22 '15 at 13:44
  • This was very helpful for me. I did make a minor improvement IMHO to reduce the multiple constructors to a single version that equally simple. ` public FileSystemEnumerable(DirectoryInfo root, SearchOption option, params string[] patterns) { _root = root; _patterns = patterns.ToList(); _option = option; }` – DVS Jun 19 '17 at 11:59
  • By chaining `_root.EnumerateDirectories()` with `_root.EnumerateFiles()` (to force the output to be "grouped" together by type?) plus another call to `_root.EnumerateDirectories()` at the end of `GetEnumerator()`, at a low level I believe this will cause it to [`FindNextFile()`](https://referencesource.microsoft.com/#mscorlib/system/io/filesystemenumerable.cs,466) _three times_ for every object in the directory, which would be bad for directories with a large number of children. This is because all three `Enumerate*s()` methods perform a full directory scan regardless of the search object type – Lance U. Matthews May 15 '20 at 00:14
1

Here's another way, manage your own enumeration iteration:

IEnumerator<string> errFiles=Directory.EnumerateFiles(baseDir, "_error.txt", SearchOption.AllDirectories).GetEnumerator();
while (true)
{
    try
    {
        if (!errFiles.MoveNext())
            break;
        string errFile = errFiles.Current;
        // processing
    } catch (Exception e)
    {
        log.Warn("Ignoring error finding in: " + baseDir, e);
    }
}
Andrew Taylor
  • 1,368
  • 1
  • 11
  • 8
  • It's been a while since I worked on this problem, but if I remember correctly, the problem is the Directory.EnumerateFiles(...) throwing the exception because it pre-checks everything in the directory before returning the Enumerator. – Matthew Brubaker Aug 09 '18 at 19:33
  • 1
    Not in this case, I'm quite sure. It would be the case when using GetFiles() as all files are iterated and collected, but with EnumerateFiles(), the permission assertion occurs individually on MoveNext(). Due to the try and while, error result in logging and moving to the next. – Andrew Taylor Aug 10 '18 at 22:17
  • Ahh yes, I believe you are correct in regards to solving the permission problem. Although, as near as I can tell, they do not trap all errors that can occur during the MoveNext(). PathTooLongException for instance is not caught and handled automatically. My understanding of the internal implementation of the File System Enumerator is that any exception that does occur and is not handled within the Enumerator instance causes it to become invalidated. That said, it has been several years since I have dug in and verified the behavior. – Matthew Brubaker Aug 28 '18 at 20:39
  • @AndrewTaylor : the problem here is that after the exception the enumerator doesn't MoveNext() but instead returns false. – Lemmes Jul 13 '22 at 16:06