0

I have a C# application where at one point it scans a folder that could potentially contain 10s of thousands of files. It the filters that list by name and length and selects a relatively small number for processing.

Simplified code:

DirectoryInfo directoryInfo = new DirectoryInfo(path);
FileSystemInfo[] fileSystemInfos = directoryInfo.GetFileSystemInfos();
List<MyInfo> myInfoList = fileSystemInfos
    .Where(f => (f.Attributes & FileAttributes.Directory) != FileAttributes.Directory))
    .Select(f => new MyInfo {
        FilePath = f.FullName,
        FileSize = new FileInfo(f.FullName).Length,
        })
    .ToList();

The logic later selects a handful of files and verifies a non-zero length.

The problem is that the individual calls to FileInfo(f.FullName).Length are killing performance. Under the covers, I see that FileInfo internally stores a WIN32_FILE_ATTRIBUTE_DATA struct that contains length (fileSizeLow and fileSizeHigh), but does not exposes that as a property.

Question: Is there an simple alternative to the above that can retrieve file names and lengths efficiently without the extra FileInfo.Length call?

My alternative is to make the MyInfo.FileSize property a lazy load property, but I wanted to check for a more direct approach first.

Rand Random
  • 7,300
  • 10
  • 40
  • 88
T N
  • 4,322
  • 1
  • 5
  • 18
  • 1
    Instead of calling GetFileSystemInfos can't you just call GetFiles? GetFiles returns FileInfo objects which have a Length property. Doing this also means that you don't have to manually weed out the directory entries as only files will be returned. – ajz Jun 21 '23 at 21:06
  • That sounds like the answer. I'll give it a try. I expect that will eliminate the is-not-directory check also. – T N Jun 21 '23 at 21:16
  • @ajz Thanks for the answer. Just experimenting in powershell yielded a several-fold efficiency increase for a local folder. I'll test in C# against a network folder at next opportunity, where I am hoping for a much more dramatic improveent. FYI - The folder being processed normally has at most a handful of unprocessed files, but after a processing outage it was taking forever to clear out the backlog with the existing logic, and we had to resort to spoon feeding files. This fix should allow for a much more speedy recovery. If you post as answer, I will accept. – T N Jun 21 '23 at 23:15

0 Answers0