7

I have program which writes to database which folders are full or empty. Now I'm using

bool hasFiles=false;
(Directory.GetFiles(path).Length >0) ? hasFiles=true: hasFiles=false;

but it takes almost one hour, and I can't do anything in this time.

Is there any fastest way to check if folder has any file ?

ChrisF
  • 134,786
  • 31
  • 255
  • 325
user278618
  • 19,306
  • 42
  • 126
  • 196
  • 2
    What is the "it" that takes an hour? This particular line of code, or using that in a loop over thousands of directories on your disk? – Hans Kesting Apr 26 '10 at 10:55
  • How many files in the folder? – Lasse V. Karlsen Apr 26 '10 at 11:05
  • @Karlsen In each folder is one file. – user278618 Apr 26 '10 at 11:10
  • 1
    @Hans Kesting about 30k folders – user278618 Apr 26 '10 at 11:11
  • 2
    @phenevo - have you checked what takes the time? Is it querying the file-system, or (more likely) writing to the database? – Marc Gravell Apr 26 '10 at 11:11
  • What's the structure you're searching through? Lots of directories dotted all over the place, or is (for example): "C:\FoldersToSearch\", "C:\FoldersToSearch\1\", "C:\FoldersToSearch\2\", "C:\FoldersToSearch\99" etc? – djdd87 Apr 26 '10 at 11:12
  • 1
    @Marc Gravell I checked. It's a problem with folders . Reading from database is fast. – user278618 Apr 26 '10 at 11:13
  • @ GenericTypeTea structure has from 7 to 12 nodes, but string path is always last folder. I check files only in one folder – user278618 Apr 26 '10 at 11:14
  • Sorry, I don't really understand you. Can you elaborate and post some examples like I did in my previous comment? – djdd87 Apr 26 '10 at 11:17
  • @GenericTypeTea \\192.168.1.100\MainFolder\Storage\Portugalia\Algarve\Faro\Vilamoura\Dom Pedro Marina\!Content\Videos\YouTube – user278618 Apr 26 '10 at 11:19
  • 1
    Ah, you're doing it over a network... That'll certainly cause some slow down with that amount of files. Is that a networked external drive, or a server? – djdd87 Apr 26 '10 at 11:20
  • 1
    @phenevo - one server? Or lots of servers? The obvious thing here is to reduce the network hops; for example, execute the file/folder access code *on the file server*, bringing the results back in one block that is then processed / persisted. – Marc Gravell Apr 26 '10 at 11:28
  • @Marc Gravell only one server – user278618 Apr 26 '10 at 11:38
  • 1
    `(Directory.GetFiles(path).Length >0) ? hasFiles=true: hasFiles=false;`? I don't think that's honestly what you have, since that line wouldn't compile. (If you're trying to be succinct, that would be `bool hasFiles = Directory.GetFiles(path).Length > 0;` Or am I missing something? – Dan Tao Apr 26 '10 at 11:40
  • @phenevo Why don't you post up the actual code and folder checking code that you're using? – djdd87 Apr 26 '10 at 11:43
  • sorry, I can't., but thank you for trying to help :) – user278618 Apr 27 '10 at 10:54
  • possible duplicate of [How to quickly check if folder is empty (.NET)?](http://stackoverflow.com/questions/755574/how-to-quickly-check-if-folder-is-empty-net) – bluish Mar 12 '14 at 10:11

6 Answers6

10

To check if any files exists inside the directory or sub directories, in .net 4, you can use method below:

public bool isDirectoryContainFiles(string path) {
    if (!Directory.Exists(path)) return false;
    return Directory.EnumerateFiles(path, "*", SearchOption.AllDirectories).Any();
}
Leng Weh Seng
  • 725
  • 8
  • 6
6

The key to speeding up such a cross-network search is to cut down the number of requests across the network. Rather than getting all the directories, and then checking each for files, try and get everything from one call.

In .NET 3.5 there is no one method to recursively get all files and folders, so you have to build it yourself (see below). In .NET 4 new overloads exist to to this in one step.

Using DirectoryInfo one also gets information on whether the returned name is a file or directory, which cuts down calls as well.

This means splitting a list of all the directories and files becomes something like this:

struct AllDirectories {
  public List<string> DirectoriesWithoutFiles { get; set; }
  public List<string> DirectoriesWithFiles { get; set; }
}

static class FileSystemScanner {
  public AllDirectories DivideDirectories(string startingPath) {
    var startingDir = new DirectoryInfo(startingPath);

    // allContent IList<FileSystemInfo>
    var allContent = GetAllFileSystemObjects(startingDir);
    var allFiles = allContent.Where(f => !(f.Attributes & FileAttributes.Directory))
                             .Cast<FileInfo>();
    var dirs = allContent.Where(f => (f.Attributes & FileAttributes.Directory))
                         .Cast<DirectoryInfo>();
    var allDirs = new SortedList<DirectoryInfo>(dirs, new FileSystemInfoComparer());

    var res = new AllDirectories {
      DirectoriesWithFiles = new List<string>()
    };
    foreach (var file in allFiles) {
      var dirName = Path.GetDirectoryName(file.Name);
      if (allDirs.Remove(dirName)) {
        // Was removed, so first time this dir name seen.
        res.DirectoriesWithFiles.Add(dirName);
      }
    }
    // allDirs now just contains directories without files
    res.DirectoriesWithoutFiles = new List<String>(addDirs.Select(d => d.Name));
  }

  class FileSystemInfoComparer : IComparer<FileSystemInfo> {
    public int Compare(FileSystemInfo l, FileSystemInfo r) {
      return String.Compare(l.Name, r.Name, StringComparison.OrdinalIgnoreCase);
    }
  }
}

Implementing GetAllFileSystemObjects depends on the .NET version. On .NET 4 it is very easy:

ILIst<FileSystemInfo> GetAllFileSystemObjects(DirectoryInfo root) {
  return root.GetFileSystemInfos("*.*", SearchOptions.AllDirectories);
}

On earlier versions a little more work is needed:

ILIst<FileSystemInfo> GetAllFileSystemObjects(DirectoryInfo root) {
  var res = new List<FileSystemInfo>();
  var pending = new Queue<DirectoryInfo>(new [] { root });

  while (pending.Count > 0) {
    var dir = pending.Dequeue();
    var content = dir.GetFileSystemInfos();
    res.AddRange(content);
    foreach (var dir in content.Where(f => (f.Attributes & FileAttributes.Directory))
                               .Cast<DirectoryInfo>()) {
      pending.Enqueue(dir);
    }
  }

  return res;
}

This approach calls into the filesystem as few times as possible, just once on .NET 4 or once per directory on earlier versions, allowing the network client and server to minimise the number of underlying filesystem calls and network round trips.

Getting FileSystemInfo instances has the disadvantage of needing multiple file system operations (I believe this is somewhat OS dependent), but for each name any solution needs to know if it is a file or directory so this is not avoidable at some level (without resorting to P/Invoke of FindFileFirst/FindNextFile/FindClose).


Aside, the above would be easier with a partition extension method:

Tuple<IEnumerable<T>,IEnumerable<T>> Extensions.Partition<T>(
                                                 this IEnumerable<T> input,
                                                 Func<T,bool> parition);

Writing that to be lazy would be an interesting exercise (only consuming input when something iterates over one of the outputs, while buffering the other).

Richard
  • 106,783
  • 21
  • 203
  • 265
  • Needed something similar to this but was just wondering. When you use the `addDirs` variable I guess you meant `allDirs`? Or am I missing something? – Niklas Nov 07 '16 at 09:30
  • @Niklas probably. (But it has been a few years...) Remember you don't need this code in .NET 4 because it can read files and directories recursively. – Richard Nov 07 '16 at 10:12
4

If you are using .Net 4.0 have a look at the EnumerateFiles method. http://msdn.microsoft.com/en-us/library/dd413232(v=VS.100).aspx

The EnumerateFiles and GetFiles methods differ as follows: When you use EnumerateFiles, you can start enumerating the collection of FileInfo objects before the whole collection is returned; when you use GetFiles, you must wait for the whole array of FileInfo objects to be returned before you can access the array. Therefore, when you are working with many files and directories, EnumerateFiles can be more efficient.

This way not all the files are retrieved from the folder, if the enumerator has at least 1 file the folder is not empty

Jasper
  • 451
  • 3
  • 7
3

I'm assuming (although I don't know for definite) that because you're calling GetFiles() on a network drive it adds considerable time to retrieve all the files from all 30k folders and enumerate through them.

I've found an alternative Directory Enumerator here on CodeProject which looks promising.

Alternatively... you could create a WebService on the server that enumerates everything for you and returns the results after.

EDIT: I think your problem is more likely the folder access. Each time you access a Directory in the network drive you're going to be hitting security and permission checks. That * 30k folders will be a big performance hit. I highly doubt using the FindFirstFile will help much as the actual number of files enumerated will only ever be 0 or 1.

djdd87
  • 67,346
  • 27
  • 156
  • 195
2

Might be worth mentioning:

but it takes almost one hour, and I can't do anything in this time. (emphasis added)

Are you doing this from a GUI app, on the main thread? If so, spit this process off using a BackgroundWorker. At least then the app will continue to be responsive. You could also add checks for CancellationPending in the method and cancel it if it's taking too long.

Kind of tangential to your question--just something I noticed and thought I'd comment on.

Dan Tao
  • 125,917
  • 54
  • 300
  • 447
0

Your best bet is to use the API function FindFirstFile. It wont take nearly as long then.

logicnp
  • 5,796
  • 1
  • 28
  • 32
  • 1
    Each folder only has one file; the problem *looks* to be the vast number of remote *folders*, accessed sequentially. – Marc Gravell Apr 26 '10 at 11:27
  • +1 Here's a discussion where someone finds that FindFirstfile is a lot faster than Directories.GetFiles for checking for empty directories so worth trying: http://stackoverflow.com/questions/755574/how-to-quickly-check-if-folder-is-empty-net – Hans Olsson Apr 26 '10 at 11:33
  • 1
    I'm in agreement with Marc here. The problem isn't enumerating files, it's enumerating and stepping through all the folder structures. Each time .Net calls GetFiles() on a Directory, there's going to be a series of Security checks every time the Directory has access attempted on it. – djdd87 Apr 26 '10 at 11:35