2

I need some help. Right now i have done a file search that will search my entire hard drive and it works. Here are the two methods that does it.

public void SearchFileRecursiveNonMultithreaded()
    {
        //Search files multiple drive

        string[] drives = Environment.GetLogicalDrives();

        foreach (string drive in drives)
        {
            if (GetDriveType(drive).ToString().CompareTo("DRIVE_FIXED") == 0)
            {
                DriveInfo driveInfo = new DriveInfo(drive);

                if (driveInfo.IsReady)
                {
                    System.IO.DirectoryInfo rootDirectory = driveInfo.RootDirectory;
                    RecursiveFileSearch(rootDirectory);
                }
            }
        }
        MessageBox.Show(files.Count.ToString());
    }

    public void RecursiveFileSearch(DirectoryInfo root)
    {
        DirectoryInfo[] subDirectory;
        try
        {
        //private List<FileInfo> files = new List<FileInfo>() is declared above
            files.AddRange(root.GetFiles(searchString.Text, SearchOption.TopDirectoryOnly));
        }
        catch (Exception)
        {
        }

        try
        {
            // Now find all the subdirectories under this directory.
            subDirectory = root.GetDirectories();

            foreach (System.IO.DirectoryInfo dirInfo in subDirectory)
            {
                // Resursive call will be performed for each subdirectory.
                RecursiveFileSearch(dirInfo);
            }
        }
        catch (Exception e)
        {
            MessageBox.Show(e.ToString());
        }
    }

Right now i am trying to implement a parallel search to make the search faster. I tried several procedures to get this to work. Tried to use backgroundworker as well as threads but have problems with it and it is very difficult to debug to know what is wrong ? Can someone let me know the approach to implement a parrallel search. The step will do i will go and figure out on my own. Any help provided will be greatly apperciated.

abduls85
  • 548
  • 8
  • 15
  • Are we talking about multiple physical drives or just one? – Conrad Frix Jun 03 '11 at 16:39
  • Just one drive. I got c and d drive and the code above works fine to search for files in the 2 partitions. – abduls85 Jun 03 '11 at 16:41
  • 12
    You should not forget that your hdd is "single threaded": fetching data from different sectors that are far from each other simultaneously may actually decrease performance. It probably only makes sense to try doing it when you work with different physical drives in different threads, and that's not easy to determine using filesystem info only. – Dyppl Jun 03 '11 at 16:41
  • Dyppl thanks for the input. Ultimately my code should work with multiple drives as well. But most importantly single drive. – abduls85 Jun 03 '11 at 16:44
  • it isn't going to. Sure it can be multi-threaded, but as long as they all look at the same spinning disk you jaw trouble. Even with a fast SSD the CPU will be faster. – Marc Gravell Jun 03 '11 at 18:09

4 Answers4

5

First, as somebody else pointed out, it's unlikely that using multiple threads will speed things up when you're searching just one drive. The vast majority of your time is spent waiting for the disk head to move to where it needs to be, and it can only be in one place at a time. Using multiple threads here is wasted effort, and has a high likelihood of actually making your program slower.

Second, you can simplify your code by just calling Directory.EnumerateFiles. If you want to search multiple drives concurrently, simply start multiple BackgroundWorker instances, each using EnumerateFiles to search a different drive.

Note, however, that EnumerateFiles will throw an exception (as will your code) if it runs across directory permissions problems, which aren't uncommon when searching an entire drive. If that's a problem (and it likely will be), then you have to write your own directory searcher. One such is in the answer to this question.

Community
  • 1
  • 1
Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
  • "it's unlikely that using multiple threads will speed things up when you're searching just one drive" shouldn't it be "one disk" instead of "one drive"? Because for one disk you can have multiple drives, however there is only one disk head for all the drives. Right? – Alan Deep Mar 28 '18 at 09:44
  • @AlanDeep In this context, "disk" and "drive" are used interchangeably. See https://en.wikipedia.org/wiki/Hard_disk_drive You can have multiple physical disk drives and do simultaneous reads and writes. That is, you can be reading from drive 1, and writing to drive 2. You can also have *logical* drives, all of which exist on a single physical drive. In that case, you cannot do simultaneous reads and writes. – Jim Mischel Mar 28 '18 at 13:57
  • So you meant physical drive in your answer and not logical drive. Thanks for clarifying – Alan Deep Mar 28 '18 at 14:23
1

Your outer loop, foreach (string drive in drives) could gain from changing into a Parallel.ForEach().

Your inner loop (the RecursiveFileSearch() ) should not be made parallel, you'll just loose performance. But from Fx4 you can replace GetFiles() with EnumerateFiles() to get some better results on very large folders.

And that solves most of your tread-safety issues, the outer loop should provide a List for each drive to fill (non-async). Afterwards, merge those list after the ForEach().

The exact answer is more difficult: Searching Logical disks in parallel won't help much, the gains will be from independent 'axles'. But on a big RAID volume, searching the files could benefit from a few extra threads.

H H
  • 263,252
  • 30
  • 330
  • 514
1

While searching logical drives simultaneously could help or hurt performance, here's how you might manage the threads:

    using System.Threading;
    ...

    string[] drives = Environment.GetLogicalDrives();
    List<Thread> threads = new List<Thread>();
    foreach (string drive in drives)
    {
        if (GetDriveType(drive).ToString().CompareTo("DRIVE_FIXED") == 0)
        {
            DriveInfo driveInfo = new DriveInfo(drive);

            if (driveInfo.IsReady)
            {
                System.IO.DirectoryInfo rootDirectory = driveInfo.RootDirectory;
                var thread = new Thread((dir) => RecursiveFileSearch((DirectoryInfo)dir));
                threads.Add(thread);
                thread.Start(rootDirectory);
            }
        }
    }
    foreach(var t in threads) t.Join();
    MessageBox.Show(files.Count.ToString());

Don't forget to lock any shared collection used by RecursiveFileSearch. You should try to avoid such access because it creates contention.

Eric Mickelsen
  • 10,309
  • 2
  • 30
  • 41
  • I have been using Semaphoreslim before accessing the critical section which is the List. Thanks man i hope the above works and the performance will be better. Will be trying this out later. – abduls85 Jun 03 '11 at 17:08
0

One solution to make it multi-threaded is to dump each call to RecursiveFileSearch into ThreadPool.QueueUserWorkItem to have it run on multiple threads.

Now, be cautioned with this approach for the following reasons:

1) As Dypple stated, accessing the drive is single threaded so this really could hurt performance

2) List is not threadsafe so you would need to do a lock/synchronize on it before adding to the list. This could also hurt performance alot. Consider using System.Collections.Concurrent.ConcurrentBag (in .NET 4.0) to have it control synchronoziation for you since you are just doing additions.

3) Adding every file you encounter to the list can result in an overflow if you have greater then MaxIntFiles.

4) This File collection could become huge and may result in an out of memory exception.

JMcCarty
  • 759
  • 5
  • 17