5

Possible Duplicate:
What is the difference between task and thread?

I understand the title itself may appear to be a duplicate question but I've really read all the previous posts related to this topic and still don't quite understand the program behavior.

I'm currently writing a small program that checks around 1,000 E-mail accounts. Undoubtedly I feel multithreading or multitasking is the right approach since each thread / task is not computationally expensive but the duration of each thread relies heavily on network I/O.

I think that under such a scenario, it would also be reasonable to set the number of threads / tasks at a number that is much larger than the number of cores. (four for i5-750). Therefore I've set the number of threads or tasks at 100.

The code snippet written using Tasks:

        const int taskCount = 100;
        var tasks = new Task[taskCount];
        var loopVal = (int) Math.Ceiling(1.0*EmailAddress.Count/taskCount);

        for (int i = 0; i < taskCount; i++)
        {
            var objContainer = new AutoCheck(i*loopVal, i*loopVal + loopVal);
            tasks[i] = new Task(objContainer.CheckMail);
            tasks[i].Start();
        }
        Task.WaitAll(tasks);

The same code snippet written using Threads:

        const int threadCount = 100;
        var threads = new Thread[threadCount];
        var loopVal = (int)Math.Ceiling(1.0 * EmailAddress.Count / threadCount);

        for (int i = 0; i < threadCount; i++)
        {
            var objContainer = new AutoCheck(i * loopVal, i * loopVal + loopVal);
            threads[i] = new Thread(objContainer.CheckMail);
            threads[i].Start();
        }
        foreach (Thread t in threads)
            t.Join();
        runningTime.Stop();
        Console.WriteLine(runningTime.Elapsed);

So what are the essential differences between these two?

Community
  • 1
  • 1
derekhh
  • 5,272
  • 11
  • 40
  • 60
  • From what I understand, the TaskScheduler will put the tasks that requires more thread than what the hardware can provide on pause until there is enough thread available for that task. The idea, if I understand right, is that if you have a huge number of thread started at roughly the same time, they will fight for hardware resources. Fight that wouldn't happen with tasks. – LightStriker Oct 16 '12 at 20:06
  • @Marc-AndréJutras That's just the behavior of tasks started using the default task scheduler, which just runs them all on the thread pool. You can have tasks that never create a thread, or never execute any code at all in any other thread. – Servy Oct 16 '12 at 20:08
  • You've written both, why not test both? – Martin James Oct 16 '12 at 20:34
  • @MartinJames Well, they'll both *work*. If you don't know what to look for, it may be hard to actually tell how they run differently. – Servy Oct 16 '12 at 20:36

2 Answers2

8

Tasks do not necessarily correspond to threads. They will be scheduled by the task library to threadpool threads in a much more efficient manner than your thread code.

Threads are fairly expensive to create. Tasks will queue up and reuse threads as they become available, so when a thread is waiting for network IO, it can actually be reused to execute another task. Threads that sit idle are wasted resources. You can only execute the number of threads that corresponds to your processor core count (simultaneously), so 100 threads means context switches on all your cores at least 25 times each.

If using tasks, just queue up all 1000 email processing tasks instead of batching them up and let it rip. The task library will handle how many threads to run it on.

Mike Marynowski
  • 3,156
  • 22
  • 32
  • This is all applying to the default task scheduler. You can also use custom task schedulers, have tasks that were generated from TaskCompletionSource objects, etc. – Servy Oct 16 '12 at 20:09
  • "If using tasks, just queue up all 1000 email processing tasks instead of batching them up and let it rip. The task library will handle how many threads to run it on." means I can just change the value "taskCount" to 1000 and let it go? – derekhh Oct 16 '12 at 20:10
  • Yes, and he is using the default task scheduler in his example :) – Mike Marynowski Oct 16 '12 at 20:10
  • @derekhh Yes, you can do just that, and the task library will schedule those 1000 tasks for you into a suitable number of threads. – Mike Marynowski Oct 16 '12 at 20:12
0

The sort quick answer is that a Task does not equal a Thread. A task is queued up in a Task Scheduler and then executed on a Thread, but queuing up 100 Tasks does not mean you will have 100 threads running.

Usually a task will run on a thread from the thread pool, which has a finite size. Once all of those threads are busy, then your tasks will have to wait for a thread to become available to execute your task. It's also possible to queue them up on things like the UI thread using the appropriate task scheduler, in which case they end up being run synchronously by the UI thread's message loop (this is useful when you're interacting with a UI).

In your task example, there is no need to actually limit the number of tasks started, since the tasks will already get queued up and will have to wait until there is a thread available to run them. This is probably the best approach, because you're letting the system determine the maximum number of threads based on what the system can handle, rather than assuming you have enough CPU/Memory to handle it.

Whereas your example using threads specifically mean that you will definitely spin up the number of threads you're alloting for.

CodingGorilla
  • 19,612
  • 4
  • 45
  • 65
  • `"A task is queued up in a Task Scheduler and then executed on a Thread"` Possibly, or possibly not. It may not execute on another thread at all. A task is fundimentially nothing more than something saying, "there is some *task* that will be finished at a later time. I'll tell you when it's done, and what it's result was (if any). – Servy Oct 16 '12 at 20:11
  • @Servy I said "executed on **a Thread**" not another thread. It will always be execute on a thread, be that the UI thread, a task pool thread, or some other thread. – CodingGorilla Oct 16 '12 at 20:14
  • Nope. You're assuming the task is started using a delegate, which isn't always the case. Imagine a Task created using a `TaskCompletionSource`. It doesn't have anything to execute at all, technically. – Servy Oct 16 '12 at 20:15
  • @Servy I think you're being overly literal, code has to be executed on a thread somewhere. If there is no code to execute then I think it's a given that nothing is happening anywhere. My answer is meant to be a general overview of the difference between running a task and running a thread, not a in-depth tutorial that explains every nuance of the TPL. – CodingGorilla Oct 16 '12 at 20:19
  • Even if you don't explain every nuance, you are fundamentally explaining a task as something you execute; it's some method or delegate that is *run*. That's not always true. It *can* be true, and it's one of the more common examples, but there are other kinds of tasks entirely, for example, imagine the `Task` that is returned by `Task.WhenAll`. That task isn't being *run* anywhere. It's result is dependent on *other* tasks running somewhere, but it has no code to run. Even if you don't explain the ramifications of the difference there, your summary of what a Task **is** is flawed. – Servy Oct 16 '12 at 20:26
  • My explanation is not flawed in context of the OP's question. Also, even in the example you cited, there *is* code to be run. In the case of a `Task.WhenAll` there is code that needs to be run, for starters it checks that none of the tasks are null (those checks have to run on a thread), eventually it tracks all the tasks and which have been completed, all of which requires code to execute *on a thread*. Granted this is not code that you have written, but it's still code that is executing on a thread. – CodingGorilla Oct 16 '12 at 20:40
  • Your answer isn't really placed in the context of the question. If you said something along the lines of "in this case" or "in your example" or similar then sure, it would be fine. Instead, you make several broad statements about Task (which I assert aren't true) and then apply those statements to this specific question (which is a good approach, so long as your assertions are correct). Anyway, you say that I'm wrong, and your statement is true globally, and also that it is true in the context of this question ??? – Servy Oct 16 '12 at 20:43
  • As for the example of `Task.WhenAll`, you need to separate out the `WhenAll` method (which is just a regular old synchronous method running in the current thread) from the `Task` that it returns. The null checking of the input tasks, the wiring up of continuations to those tasks, etc. is all a part of the *method*, not the `Task` that it returns. Fundamentially, a task isn't "some block of code being executed somewhere", it's more of a way of executing code (a continuation) when something happens (much like an event) or waiting until that "something happens" happens (through `Wait`). – Servy Oct 16 '12 at 20:48