0

I'm trying to Speedup the Sum-calculation of all Files in all Folders recursive given by one Path.

Let's say i choose "E:\" as Folder. I will now get the entrie recursive Fileslist via "SafeFileEnumerator" into IEnumerable in Milliseconds (works like a charm)

Now i would like to gather the sum of all bytes from all files in this Enumerable. Right now i loop them via foreach and get the FileInfo(oFileInfo.FullName).Length; - for each file.

This is working, but it is slow - it takes about 30 seconds. If i lookup the space consumption via Windows rightclick - properties of all selected folders in the windows explorer i get them in about 6 seconds (~ 1600 files in 26 gigabytes of data on ssd)

so my first thougth was to speedup gathering by the usage of threads, but i don't get any speedup here..

the code without the threads is below:

public static long fetchFolderSize(string Folder, CancellationTokenSource oCancelToken)
{
    long FolderSize = 0;

    IEnumerable<FileSystemInfo> aFiles = new SafeFileEnumerator(Folder, "*", SearchOption.AllDirectories);
    foreach (FileSystemInfo oFileInfo in aFiles)
    {
        // check if we will cancel now
        if (oCancelToken.Token.IsCancellationRequested)
        {
            throw new OperationCanceledException();
        }

        try
        {
            FolderSize += new FileInfo(oFileInfo.FullName).Length;
        }
        catch (Exception oException)
        {
            Debug.WriteLine(oException.Message);
        }
    }

    return FolderSize;
}

the multithreading code is below:

public static long fetchFolderSize(string Folder, CancellationTokenSource oCancelToken)
{
    long FolderSize = 0;

    int iCountTasks = 0;

    IEnumerable<FileSystemInfo> aFiles = new SafeFileEnumerator(Folder, "*", SearchOption.AllDirectories);
    foreach (FileSystemInfo oFileInfo in aFiles)
    {
        // check if we will cancel now
        if (oCancelToken.Token.IsCancellationRequested)
        {
            throw new OperationCanceledException();
        }

        if (iCountTasks < 10)
        {
            iCountTasks++;
            Thread oThread = new Thread(delegate()
            {
                try
                {                            
                    FolderSize += new FileInfo(oFileInfo.FullName).Length;
                }
                catch (Exception oException)
                {
                    Debug.WriteLine(oException.Message);
                }

                iCountTasks--;
            });
            oThread.Start();
            continue;
        }

        try
        {
            FolderSize += new FileInfo(oFileInfo.FullName).Length;
        }
        catch (Exception oException)
        {
            Debug.WriteLine(oException.Message);
        }
    }

    return FolderSize;
}

could someone please give me an advice how i could speedup the foldersize calculation process?

kindly regards

Edit 1 (Parallel.Foreach suggestion - see comments)

public static long fetchFolderSize(string Folder, CancellationTokenSource oCancelToken)
{
    long FolderSize = 0;

    ParallelOptions oParallelOptions = new ParallelOptions();
    oParallelOptions.CancellationToken = oCancelToken.Token;
    oParallelOptions.MaxDegreeOfParallelism = System.Environment.ProcessorCount;

    IEnumerable<FileSystemInfo> aFiles = new SafeFileEnumerator(Folder, "*", SearchOption.AllDirectories).ToArray();

    Parallel.ForEach(aFiles, oParallelOptions, oFileInfo =>
    {
        try
        {
            FolderSize += new FileInfo(oFileInfo.FullName).Length;
        }
        catch (Exception oException)
        {
            Debug.WriteLine(oException.Message);
        }
    });

    return FolderSize;
}
Community
  • 1
  • 1
eXe
  • 662
  • 11
  • 26

2 Answers2

0

Side-note about SafeFileEnumerator performance:

Once you get IEnumerable, it doesn't mean you got entire collection because it is lazy proxy. Try this snippet below - I'm sure you'll see the performance difference (sorry if it's not compiling - just to illustrate the idea):

var tmp = new SafeFileEnumerator(Folder, "*", SearchOption.AllDirectories).ToArray(); // fetch all records explicitly to populate the array
IEnumerable<FileSystemInfo> aFiles = tmp;

Now out the actual result you want to achieve.

  1. If you need just file sizes - it's better to request OS functions about filesystem, not querying files one-by-one. I'd start with DirectoryInfo class (see for instance http://www.tutorialspoint.com/csharp/csharp_windows_file_system.htm).
  2. If you need to calculate the checksum for each, it would be definitely slow task because you have to load each of the files first (a lot of memory transfers). Threads are not a booster here because they'll be limited by OS filesystem throughput, not your CPU power.
Yury Schkatula
  • 5,291
  • 2
  • 18
  • 42
  • thanks for answering fast, but i can't see any changes in speed if i try the snippen of you. the list of files is collected in milliseconds. btw: as i can see the performance of the hdd in the performancemanager in windows - i can't see any "strong" hdd/ssd access while my code is running and gathering the sum of the files / folders :-/ – eXe Nov 13 '14 at 15:38
0
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using System.IO;

namespace ConsoleApplication3
{
    class Program
    {
        static void Main(string[] args)
        {
            long size = fetchFolderSize(@"C:\Test", new CancellationTokenSource());

        }

            public static long fetchFolderSize(string Folder, CancellationTokenSource  oCancelToken)
    {


            ParallelOptions po = new ParallelOptions();
            po.CancellationToken = oCancelToken.Token;
            po.MaxDegreeOfParallelism = System.Environment.ProcessorCount;

            long folderSize = 0;
            string[] files = Directory.GetFiles(Folder);

            Parallel.ForEach<string,long>(files,
                                            po,
                                            () => 0,
                                            (fileName, loop, fileSize) => 
                                            {
                                                fileSize = new FileInfo(fileName).Length;
                                                po.CancellationToken.ThrowIfCancellationRequested();
                                                return fileSize;

                                            },  
                                            (finalResult) => Interlocked.Add(ref folderSize, finalResult)
                                            );


            string[] subdirEntries = Directory.GetDirectories(Folder);

            Parallel.For<long>(0, subdirEntries.Length, () => 0, (i, loop, subtotal) =>
            {
                if ((File.GetAttributes(subdirEntries[i]) & FileAttributes.ReparsePoint) !=
                      FileAttributes.ReparsePoint)
                    {
                        subtotal += fetchFolderSize(subdirEntries[i], oCancelToken);
                        return subtotal;
                    }
                    return 0;
                },
                    (finalResult) => Interlocked.Add(ref folderSize, finalResult)
                );

            return folderSize ;
    }
    }

}
  • Parallel task with recursion – Brian McLeod Nov 13 '14 at 17:39
  • Performing speed test outside of debug mode I get results equivalent to Windows Explorer on large directories. – Brian McLeod Nov 13 '14 at 17:58
  • thank you for your contribution. i just tested your code, but i have severeal issues with it :( if i choose "E:/" as starting folder to get the size of, i will get an exception from "Directory.GetDirectories(Folder)" couse it will (internaly) try to get the folderinformations of "System Volume Informations" Folder, which i do not have access to. that is btw one thing why i switched to "SafeFileEnumerator". On the other side i get only 64MB Total Size of my test with C:\users Folder, which contains 33GB of data :-/ – eXe Nov 14 '14 at 08:24
  • I didnt debugged your code, because of the Directory.GetDirectories(Folder); which i can not use (Exception / System Dirs) But i tried to combine my code with your Parallel.Foreach Loop suggestion, which worked, but didnt get any speedup changes :-/ – eXe Nov 14 '14 at 08:26