As we all know Directory.GetFiles() isn't very fast. I was trying to find some files as fast as possible. I came across some weired results.
I started with using Parallel.ForEach
and iterating over all Directorys
on my C:\-Drive
.
I implemented 3 Methods and displayed the result. They vary in the count of Parallels I used to go deeper in the Directory.
Here's the result.
I don't get why using a single Parallel is faster than using two... What I don't get at all is why they differ in the count of files they found ?!
Long story short heres my code:
Caller
private void Start(object sender, EventArgs e)
{
new Thread(() =>
{
Stopwatch watch = new Stopwatch();
watch.Start();
List<FileInfo> files = HighPerformanceFileGettter.GetFilesInDirectory_DoubleParallel(@"C:\", "*.txt", SearchOption.AllDirectories);
watch.Stop();
MessageBox.Show($"Found [{files.Count}] files in [{watch.Elapsed.TotalMilliseconds}ms] => [{watch.Elapsed.TotalSeconds}]s", "Double Parallel");
}).Start();
new Thread(() =>
{
Stopwatch watch = new Stopwatch();
watch.Start();
List<FileInfo> files = HighPerformanceFileGettter.GetFilesInDirectory_SingleParallell(@"C:\", "*.txt", SearchOption.AllDirectories);
watch.Stop();
MessageBox.Show($"Found [{files.Count}] files in [{watch.Elapsed.TotalMilliseconds}ms] => [{watch.Elapsed.TotalSeconds}]s", "Single Parallel");
}).Start();
new Thread(() =>
{
Stopwatch watch = new Stopwatch();
watch.Start();
List<FileInfo> files = HighPerformanceFileGettter.GetFilesInDirectory_TripleParallel(@"C:\", "*.txt", SearchOption.AllDirectories);
watch.Stop();
MessageBox.Show($"Found [{files.Count}] files in [{watch.Elapsed.TotalMilliseconds}ms] => [{watch.Elapsed.TotalSeconds}]s", "Tripe Parallel");
}).Start();
}
Searching:
public static List<FileInfo> GetFilesInDirectory_TripleParallel(string rootDirectory, string pattern, System.IO.SearchOption option)
{
List<FileInfo> resultFiles = new List<FileInfo>();
//Suchen:
DirectoryInfo root = new DirectoryInfo(rootDirectory);
if (root.Exists)
{
//Performance:
Parallel.ForEach(root.GetDirectories(), (dir) =>
{
try
{
Parallel.ForEach(dir.GetDirectories(), (dir_1) =>
{
try
{
Parallel.ForEach(dir_1.GetDirectories(), (dir_2) =>
{
try
{
resultFiles.AddRange(dir_2.GetFiles(pattern, option));
}
catch (Exception) { }
});
}
catch (Exception) { }
});
}
catch (Exception) { }
});
return resultFiles;
}
Debug.Fail($"Root [{root.FullName}] does not exist");
return null;
}
NOTE : I just posted one of three Methods but you shout see what is different. Its only the count of Paralell.Foreach'es I used.
Does anyone have and Idea what the best term would be in terms of performance and why the filecount differs ?