I will like to speed the process of traversing a tree. Here is an example of a node:
class Node
{
public List<Node> Children { get; set; }
public int SompeProperty { get; set; }
public String SomeOtherProperty { get; set; }
}
the way I traverse the try is like:
static void TraverseTree(Node ParentNode)
{
if (ParentNode.Children == null)
return;
foreach (var child in ParentNode.Children)
{
TraverseTree(child);
}
}
the ParentNode.Children
method takes about 1 millisecond because a Node represents a file or a directory. I just used this example of a node to illustrate better my point.
so if you think about it if the first node has 4 children and each of those children has 10000000 descendants we could increase the speed of this traversal if we where to traverse each of those 4 children in a separeate thread taking advantage of parallel programming. if that would have been the scenario then I would have taken that approach. But if I don't know in advance the structure of a tree how could I do it?
I been thinking about:
1) start traversing the tree place the first 10 nodes that has children on a stack then start the traversal of each on a separate thread.
2) Do something like:
static void TraverseTree(Node ParentNode)
{
if (ParentNode.Children == null)
return;
foreach (var child in ParentNode.Children)
{
ThreadPool.QueueUserWorkItem(new WaitCallback((x) =>
{
TraverseTree(child);
}), null);
}
}
this often gives me strange results but it is significantly faster.
Results
Using task improved the speed of the algorithm by about 40% here are the results:
scanning my whole C:\ drive took about 5.81 seconds with the following algorithm:
//directoryPath = "C:\"
var now = DateTime.Now;
Task<List<ScanItem>> t1 = new Task<List<ScanItem>>(() =>
{
return GetAllFilesInDirectory(directoryPath);
});
t1.Start();
t1.Wait();
var done = DateTime.Now-now; // done = 5.81 average
scanning my whole C:\ drive took about 3.01 seconds with the following algorithm:
//directoryPath = "C:\"
var now = DateTime.Now;
// get all directories in my c: drive it should only contain directories
var directories = Directory.GetDirectories(directoryPath);
// directories = 17 directories: inetpub, MSOCache, PrefLogs, ProgramFiles, ProgramFiles (x86) etc...
Task<List<ScanItem>>[] myTasks = new Task<List<ScanItem>>[directories.Length];
// create a task fore each directory in the c:\ drive
for (int k = 0; k < myTasks.Length; k++)
{
var currentDir = directories[k];
myTasks[k] = new Task<List<ScanItem>>(() =>
{
return GetAllFilesInDirectory(currentDir);
});
}
// start all the tasks
for (int k = 0; k < myTasks.Length; k++)
myTasks[k].Start();
Task.WaitAll(myTasks); // wait for all tasks to finish
var done = now - DateTime.Now; // average about 3.01 seconds
If I where to traverse the list the first algorithm returns 318,222 files and directories (that is the correct number). the second algorithm returns 318,195 which is very close I don't understand why though...
I am testing this in a computer that has 8 cores. Maybe if I where to run this on a computer that had 2 cores using one task could be faster than creating all this 17 tasks.
if you want to know what algorithm I use to get the files that fast then look at https://stackoverflow.com/a/724184/637142