9

I am having a Windows Service that needs to pick the jobs from database and needs to process it.

Here, each job is a scanning process that would take approx 10 mins to complete.

I am very new to Task Parallel Library. I have implemented in the following way as sample logic:

Queue queue = new Queue();

for (int i = 0; i < 10000; i++)
{
    queue.Enqueue(i);
}

for (int i = 0; i < 100; i++)
{
    Task.Factory.StartNew((Object data ) =>
    {
        var Objdata = (Queue)data;
        Console.WriteLine(Objdata.Dequeue());
        Console.WriteLine(
            "The current thread is " + Thread.CurrentThread.ManagedThreadId);
    }, queue, TaskCreationOptions.LongRunning);
}

Console.ReadLine();

But, this is creating lot of threads. Since loop is repeating 100 times, it is creating 100 threads.

Is it right approach to create that many number of parallel threads ?

Is there any way to limit the number of threads to 10 (concurrency level)?

Akram Shahda
  • 14,655
  • 4
  • 45
  • 65
Sai Avinash
  • 4,683
  • 17
  • 58
  • 96
  • 1
    What is it you want to do? Have one thread that processes the queue? Or have one for every item in the queue? Why don't you loop through the queue in your task rather than start 100 tasks? – Patrick Allwood Jun 24 '14 at 07:19
  • This question is about enhancing working code and it's better fit for http://codereview.stackexchange.com might want to post it there for a more targeted audience – Alex Jun 24 '14 at 07:20
  • I recieve the jobs into a dataset ,then i need to process each job paralelly – Sai Avinash Jun 24 '14 at 07:20
  • 3
    If you're doing CPU-bound work, use `Parallel.For`. If you're limited by I/O, use asynchronous I/O. And of course, you could also use a thread-safe queue, and consume it (`ConcurrentQueue` or `BlockingCollection`) in a cycle inside the task you're creating. – Luaan Jun 24 '14 at 07:21
  • @patchandthat..i need it for each item in the queue to be process paralelly – Sai Avinash Jun 24 '14 at 07:23
  • For the "Right approach" on working code, please go to CodeReview. To limit the number of threads created by tasks, you can follow the example in [this MSDN article](http://msdn.microsoft.com/en-us/library/ee789351%28v=vs.110%29.aspx?cs-save-lang=1&cs-lang=csharp#code-snippet-1) – Pierre-Luc Pineault Jun 24 '14 at 07:23
  • You get a new thread for each task, because you asked for a new thread for each task by specifying `TaskCreationOptions.LongRunning`. – Kris Vandermotten Jun 24 '14 at 07:31
  • @YuvalItzchakov backticks are for `inline code`, not highlighting random terms. – CodeCaster Jun 24 '14 at 07:36
  • @SaiAvinash if you're accessing the `Queue` from multiple threads, you'll need one that is thread-safe. You may look into `ConcurrentQueue` or see my answer below. – Yuval Itzchakov Jun 24 '14 at 08:03

2 Answers2

4

An important factor to remember when allocating new Threads is that the OS has to allocate a number of logical entities in order for that current thread to run:

  1. Thread kernel object - an object for describing the thread, including the thread's context, cpu registers, etc
  2. Thread environment block - For exception handling and thread local storage
  3. User-mode stack - 1MB of stack
  4. Kernel-mode stack - For passing arguments from user mode to kernel mode

Other than that, the number of concurrent Threads that may run depend on the number of cores your machine is packing, and creating an amount of threads that is larger than the number of cores your machine owns will start causing Context Switching, which in the long run may slow your work down.

So after the long intro, to the good stuff. What we actually want to do is limit the number of threads running and reuse them as much as possible.

For this kind of job, i would go with TPL Dataflow which is based on the Producer-Consumer pattern. Just a small example of what can be done:

// a BufferBlock is an equivalent of a ConcurrentQueue to buffer your objects
var bufferBlock = new BufferBlock<object>();

// An ActionBlock to process each object and do something with it
var actionBlock = new ActionBlock<object>(obj =>
{
     // Do stuff with the objects from the bufferblock
});

bufferBlock.LinkTo(actionBlock);
bufferBlock.Completion.ContinueWith(t => actionBlock.Complete());

You may pass each Block a ExecutionDataflowBlockOptions which may limit the Bounded Capacity (The number of objects inside the BufferBlock) and MaxDegreeOfParallelism which tells the block the number of maximum concurrency you may want.

There is a good example here to get you started.

Yuval Itzchakov
  • 146,575
  • 32
  • 257
  • 321
2

Glad you asked, because you're right in the sense that - this is not the best approach.

The concept of Task should not be confused with a Thread. A Thread can be compared to a chef in a kitchen, while a Task is a dish ordered by a customer. You have a bunch of chefs, and they process the dish orders in some ordering (usually FIFO). A chef finishes a dish then moves on to the next. The concept of Thread Pool is the same. You create a bunch of Tasks to be completed, but you do not need to assign a new thread to each task.

Ok so the actual bits to do it. There are a few. The first one is ThreadPoll.QueueUserWorkItem. (http://msdn.microsoft.com/en-us/library/system.threading.threadpool.queueuserworkitem(v=vs.110).aspx). Using the Parallel library, Parallel.For can also be used, it will automatically spawn threads based on the number of actual CPU cores available in the system.

Parallel.For(0, 100, i=>{
    //here, this method will be called 100 times, and i will be 0 to 100
    WaitForGrassToGrow();
    Console.WriteLine(string.Format("The {0}-th task has completed!",i));
});

Note that there is no guarantee that the method called by Parallel.For is called in sequence (0,1,2,3,4,5...). The actual sequence depends on the execution.

Brad Rem
  • 6,036
  • 2
  • 25
  • 50
kevin
  • 2,196
  • 1
  • 20
  • 24
  • 1
    Each job is a long running process, so i guess i should not be creating threadpool ? – Sai Avinash Jun 24 '14 at 07:27
  • The ThreadPool is auto managed by .NET. You create a lot (say 100) long running tasks. Let's say there are 4 threads in the ThreadPool. Those 4 threads will each pick up a task in the queue. Now, there are no more available threads in the pool, so the new tasks just sits there, they will not be processed. After one task is done, the thread is returned to the pool. Now there are unassigned tasks, and 1 thread available from the pool. So that thread is assigned the new task. This process repeats until there are no more tasks left. – kevin Jun 24 '14 at 07:31
  • ok..Can you correct w.r.t code where i am going wrong..so that would help me understand better ? – Sai Avinash Jun 24 '14 at 07:35
  • I've included a code snippet which illustrates `Parallel.For`. – kevin Jun 24 '14 at 08:11
  • Parallel.For may or may not use a ThreadPool. Try this private void DoStuff() { Parallel.For(0, 100, i => { //here, this method will be called 100 times, and i will be 0 to 100 Console.WriteLine($"The {i}-th task has completed. Is ThreadPool: {Thread.CurrentThread.IsThreadPoolThread} ThreadId:{Thread.CurrentThread.ManagedThreadId}"); }); } – Nikhil Apr 01 '17 at 07:30