I'm scanning some directory for items. I've just read Multithreaded Directory Looping in C# question but I still want to make it multithreated. Even though everyone says the drive will be the bottleneck I have some points:
- The drives may mostly be "single threaded" but how you know what they going to bring up in the future?
- How you know the different sub-paths you are scanning are one the same physical drive?
- I using an abstraction layer (even two) over the
System.IO
so that I can later reuse the code in different scenarios.
So, my first idea was to use Task and first dummy implementation was this:
public async Task Scan(bool recursive = false) {
var t = new Task(() => {
foreach (var p in path.scan) Add(p);
if (!recursive) return;
var tks = new Task[subs.Count]; var i = 0;
foreach (var s in subs) tks[i++] = s.Scan(true);
Task.WaitAll(tks);
}); t.Start();
await t;
}
I don't like the idea of creating a Task
for each item and generally this doesn't seem ideal, but this was just for a test as Tasks are advertised to automatically manage the threads...
This method works but it's very slow. It takes above 5s to complete, while the single threated version below takes around 0.5s to complete the whole program on the same data set:
public void Scan2(bool recursive = false) {
foreach (var p in path.scan) Add(p);
if (!recursive) return;
foreach (var s in subs) s.Scan2(true);
}
I wander what really goes wrong with fist method. The machine is not on load, CUP usage is insignificant, drive is fine... I've tried profiling it with NProfiler it don't tell me much besides the program sits on Task.WaitAll(tks)
all the time.
I also wrote a thread-locked counting mechanism that is invoked during addition of each item. Maybe it's the problem with it?
#region SubCouting
public Dictionary<Type, int> counters = new Dictionary<Type, int>();
private object cLock = new object();
private int _sc = 0;
public int subCount => _sc;
private void inCounter(Type t) {
lock (cLock) {
if (!counters.ContainsKey(t)) counters.Add(t, 1);
counters[t]++;
_sc++;
}
if (parent) parent.inCounter(t);
}
#endregion
But even if threads are waiting here, wouldn't the execution time be similar to single threaded version as opposed to 10x slower?
I'm not sure how to approach this. If I don't want to use tasks, do I need to manage threads manually or is there already some library that would fit nicely for the job?