2

I have daily reports I'm trying to collect en masse from a remote web service my code looks like this:

    public static void ProcessEnMasse(System.DateTime fromDate, DateTime endDate)
    {
        System.Threading.ThreadPool.SetMaxThreads(10, 10);
        for (System.DateTime d = fromDate; d <= endDate; d = d.AddDays(1))
        {
            System.Threading.ThreadPool.QueueUserWorkItem(new System.Threading.WaitCallback(day => ProcessOneDay(d)));
        }
    }

    public static void ProcessOneDay(System.DateTime theDate)
    {
        Log.Debug(string.Format("Processing {0:yyyy-MM-dd}...", theDate));
        var thePackager = new DataPackager();
        thePackager.CreateDatabaseImportPackage(theDate, theDate, true, false);
    }

... when I look at the logs, I notice that several of the threads are processing the same date. Why is this, and what do I need to do to prevent that from happening?

Gray
  • 115,027
  • 24
  • 293
  • 354
Jeremy Holovacs
  • 22,480
  • 33
  • 117
  • 254

3 Answers3

3

You need to be mindful of closures (I will give a fuller explanation shortly)

Basically, your code should be this:

public static void ProcessEnMasse(System.DateTime fromDate, DateTime endDate)
    {
        System.Threading.ThreadPool.SetMaxThreads(10, 10);
        for (System.DateTime d = fromDate; d <= endDate; d = d.AddDays(1))
        {
            System.DateTime newD = d;
            System.Threading.ThreadPool.QueueUserWorkItem
                (new System.Threading.WaitCallback(day => ProcessOneDay(newD)));
        }
    }

Here is Jon Skeet's chapter on closures

The high level idea is that (in your original code) when you pass in d, it actually traps that variable and the compiled code will make that variable act as though it is a global shared variable. So, what happens is when you hit your next for step, d is updated in not only your loop, but in the function that you just passed it in to.

That is the high level, but I really advocate reading Jon Skeet's article as it is very well written.

Here is another article if you are looking for more :)

Justin Pihony
  • 66,056
  • 18
  • 147
  • 180
  • Excellent. I knew it had to be something simple. I don't use threading that much, and thread pools less often, so this is definitely a "good-to-know". I'm marking this as the answer as it is the most complete. – Jeremy Holovacs Mar 30 '12 at 18:51
2

All of your lambda expressions share the same d variable.

If one of the work items only starts after the initial thread ran d = d.AddDays(1), it will use that date.

To fix this, declare a separate local variable inside the loop body and use that instead.

SLaks
  • 868,454
  • 176
  • 1,908
  • 1,964
  • Regarding the fix - this is variable "closure" http://stackoverflow.com/questions/512166/c-sharp-the-foreach-identifier-and-closures – Reddog Mar 30 '12 at 18:41
  • @Reddog I will see if I can find the article, but I am pretty sure that either in 5.0, or maybe it is just slated for the next release after, but the foreach will not act the same – Justin Pihony Mar 30 '12 at 18:50
  • 1
    @JustinPihony: Yes; Eric Lippert said this was fixed in 5. (but for for `foreach`, not `for`) – SLaks Mar 30 '12 at 18:51
2

You are "capturing the loop variable".

To solve it:

for (System.DateTime d = fromDate; d <= endDate; d = d.AddDays(1))
{
     DateTime copy = d;
     System.Threading.ThreadPool.QueueUserWorkItem(
            new System.Threading.WaitCallback(day => ProcessOneDay(copy)));
}

Or you can write it like this (note that day isn't used in the code above):

for (System.DateTime d = fromDate; d <= endDate; d = d.AddDays(1))
{         
     ThreadPool.QueueUserWorkItem(day => ProcessOneDay((DateTime) day), d);
}
H H
  • 263,252
  • 30
  • 330
  • 514