0

I am creating a windows service in C# which checks for older files and deletes them periodically from a specified directory. I tried to achieve this with Threadpooling and Task parallel library. I tried to make it async by not using locks and reset events but that did not work out for me as it was skipping some operations in between. My aim is to use all cores to process this task. Since I am doing I/O operation (file delete) which one would be better out of these two? Also suggest something else if it is efficient.

Threadpool code:

foreach (string file in files)
                {
                    using (AutoResetEvent signal = new AutoResetEvent(false))
                    {
                        ThreadPool.QueueUserWorkItem(delegate (object o)
                        {
                            if (File.GetLastWriteTime(file) <= DateTime.Today.AddDays(MaintainDuration))
                            {
                                File.Delete(file);
                                TestSuccessLog(file + " is deleted from thread " + Thread.CurrentThread.ManagedThreadId);
                            }
                            signal.Set();
                        });
                        signal.WaitOne();
                    }
                }

TPL code:

object sync = new Object();
                Parallel.ForEach(files, file =>
                {
                    lock (sync)
                    {
                        if (File.GetLastWriteTime(file) <= DateTime.Today.AddDays(MaintainDuration))
                        {
                            File.Delete(file);
                            TestSuccessLog(file + " is deleted from thread " + Thread.CurrentThread.ManagedThreadId);
                        }
                    }
                });
Devharsh Trivedi
  • 561
  • 8
  • 23
  • 1
    The `Parallel.ForEach` with the `lock` will make it synchronized again. Looks useless. Same with the `ThreadPool` and `AutoResetEvent ` variant. Why use threading if no 'job' may run simultaneously? – Jeroen van Langen Oct 11 '16 at 12:25
  • 2
    Adding to Joroen's comment, you need to learn _how_ to use threading/concurrency before worrying _which_ scheme is better –  Oct 11 '16 at 12:28
  • You use the threadpool in both cases. The first snippet will work *much* better than the second since it (accidentally) does not make the mistake of using more than one thread at the same time. The critical resource here is not CPU cores, it is the disk drive. You have only one. Especially a spindle drive does *not* like to be commandeered by more than one thread, disk seeks are the most expensive thing you can ever do with a drive. Just remove the threading, it is not useful. – Hans Passant Oct 11 '16 at 12:33
  • I have already mentioned in my description that i am not able to use it without locks or resets as it is skipping some operations in between (not thread safe) – Devharsh Trivedi Oct 11 '16 at 12:33
  • @HansPassant if threading is useless according to you for i/o then what is the way to make it execute faster(parallel)? – Devharsh Trivedi Oct 11 '16 at 12:37
  • It takes money, not software. – Hans Passant Oct 11 '16 at 12:41
  • To be more specific, what Hans means by spending money is buying a faster disk drive, (or, if possible, storing the data on multiple drives so that each drive can work on a portion of the data in parallel to the other drives). But somehow you need to have more and/or better *hardware* if you want to speed up this operation. Also note this isn't necessarily true of *all* IO operations, just specifically disk drive access. Some other forms of IO (such as, say, network requests) can be parallelized. – Servy Oct 11 '16 at 13:25

0 Answers0