6

My program has a list of 200k files. I have to import each to the database. I takes a long time so I started researching about multithreads as a means to speed up the importing process. I finally got to an implementation but I'm not sure it's actually working.

After using Workaround for the WaitHandle.WaitAll 64 handle limit? as a sample for my c# code I've came up with:

 int threadCount = 0;       

 for (int i = 0; i < this.Total; i++)
 {
       Finished = new ManualResetEvent(false);
       threadCount = this.ThreadCount;
       Interlocked.Increment(ref threadCount);

       FileHandler fh = new FileHandler(finished, sorted[i], this.PicturesFeatures, this.Outcome, this.SiteIds, this.LastId, this.Order, this.ThreadCount);
       Console.Write(i + " ");
       ThreadPool.QueueUserWorkItem(new WaitCallback(HandleFile), fh);
       Console.Write(i + " ");
       Finished.WaitOne();
 }

And HandleFile() goes as:

 private void HandleFile(object s)
    {           
        try
        {
            //code        
        }
        finally
        {
            if (Interlocked.Decrement(ref threadCount) == 0)
            {
                Finished.Set();
            }
        }
    }

I've put those console.Write thinking that if a process is longer it would finish later than some other ("0 0 1 2 2 1 3 3 ..."), but it's always in order ("0 0 1 1 2 2 3 3 4 4 ...")

Community
  • 1
  • 1
Eduardo Mello
  • 925
  • 3
  • 15
  • 32
  • I have a remark: are you sure your code spend time computing stuff, rather than doing file I/O and database operations? Otherwise, be aware of the fact that doing things in parallel may not speed-up your code too much (ie. it is not faster to read files in parallel that to read them one by one; except if the files are stored on different hard disks). – Arseni Mourzenko Jul 26 '10 at 15:22
  • Yes, but what about database? Each thread opens a different connection to the database, so it should speed up if I use parallelism. – Eduardo Mello Jul 26 '10 at 16:22
  • not so sure. Test and see what happens. In all cases, if the bottleneck is disk I/O operations (locally or on database level), implementing parallelism will just slow down the things. – Arseni Mourzenko Jul 27 '10 at 00:16

5 Answers5

4

Your output is to be expected. You're writing the output in the main thread, the QueueUserWorkItem function does not block, it registers your HandleFile function to be executed in a separate thread. So regardless of how long the work items take, your prints will happen in the expected order as they are all from the main thread.

Additionally, you're not getting the benefit of parallelism with this code because you're waiting after every item you submit. You're essentially saying I won't submit my next work item until the last one is finished. This is just a more complicated way of writing normal serialized code. In order to introduce parallelism, you need to add multiple items to the queue without waiting in between.

bshields
  • 3,563
  • 16
  • 16
2

Break into the execution while it's running (ctrl + alt + break) and then take a look at the threads window. (Debug -> Windows -> Threads).

Phil Gan
  • 2,813
  • 2
  • 29
  • 38
  • Ok. I have the Main Thread, 5 Worker Threads with , 1 worker thread called .NET SystemEvents, and another worker thread called vshost.RunParkingWindow. I compared with some Hello World program, doesn't seem to have much multithreading going on. – Eduardo Mello Jul 26 '10 at 18:19
  • How quickly do your threads execute? Try putting a break point in where you'll be pretty sure that you have multiple threads running concurrently. Also you can double click on items in the threads window and it will jump to the point of execution for that thread. – Phil Gan Jul 27 '10 at 08:27
2

You have a couple of problems.

  • The work items are going to effectively serialized since you are waiting for each one to complete before starting the next.
  • The Console.WriteLine calls are on the main thread so it is natural for them to report i as incrementing in order.

Here is the canonical pattern for doing this correctly.

int count = TOTAL_ITERATIONS;
var finished = new ManualResetEvent(false);
for (int i = 0; i < TOTAL_ITERATIONS; i++) 
{ 
  int captured = i; // Use this for variable capturing in the anonymous method.
  ThreadPool.QueueUserWorkItem(
    delegate(object state)
    {
      try
      {
        Console.WriteLine(captured.ToString());
        // Your task goes here.
        // Refer to 'captured' instead of 'i' if you need the loop variable.
        Console.WriteLine(captured.ToString());
      }
      finally
      {
        if (Interlocked.Decrement(ref count) == 0)
        {
          finished.Set();
        }
      }
    });
}
finished.WaitOne();

Edit: To easily demonstrate that multiple threads are invoked use the following code.

public static void Main()
{
    const int WORK_ITEMS = 100;
    int count = WORK_ITEMS;
    var finished = new ManualResetEvent(false);
    Console.WriteLine(Thread.CurrentThread.ManagedThreadId.ToString() + ":Begin queuing...");
    for (int i = 0; i < WORK_ITEMS; i++)
    {
        int captured = i; // Use this for variable capturing in the anonymous method. 
        ThreadPool.QueueUserWorkItem(
          delegate(object state)
          {
              try
              {
                  Console.WriteLine(Thread.CurrentThread.ManagedThreadId.ToString() + ":" + captured.ToString());
                  for (int j = 0; j < 100; j++) Thread.Sleep(1);
                  Console.WriteLine(Thread.CurrentThread.ManagedThreadId.ToString() + ":" + captured.ToString());
              }
              finally
              {
                  if (Interlocked.Decrement(ref count) == 0)
                  {
                      finished.Set();
                  }
              }
          });
    }
    Console.WriteLine(Thread.CurrentThread.ManagedThreadId.ToString() + ":...end queueing");
    finished.WaitOne();
    Console.ReadLine();
}
Brian Gideon
  • 47,849
  • 13
  • 107
  • 150
  • I've made the changes as you propose. Then I used the Thread Window (as pointed by Phil on the answer below) and, as I told Phil, doesn't seem to have much effect. – Eduardo Mello Jul 26 '10 at 18:20
  • @EduardoMello: It may be because the `ThreadPool` is choosing to execute the work items serially on only one thread from the pool. I edited my answer to include code that coerces it to assign the work items to different threads and demonstrates that it is indeed doing so by writing the thread id to the console. – Brian Gideon Jul 26 '10 at 18:43
1

First off, threads spawned from a multithreaded application are NOT guaranteed to finish in any particular order. You may have started one thread first, but it may not necessarily finish first.

WIth that said, you can use Process Explorer: http://technet.microsoft.com/en-us/sysinternals/bb896653.aspx

Process Explorer will show you which threads your program is spawning.

Icemanind
  • 47,519
  • 50
  • 171
  • 296
1

The information that you're outputting is all coming from the same thread (the one running your loop). If you want to see evidence of multiple threads, you can output the thread name or some other value from your HandleFile function.

msergeant
  • 4,771
  • 3
  • 25
  • 26