1

I'm looking into creating tasks which need to be executed on multiple threads. I could have a large number of tasks being created i.e. 2000 for example. I want to limit the number of tasks queued and executed simultaneously. Is there a way to create a certain number of tasks and then create new ones as they complete? Trying to work out if the task scheduler helps with this.

EDIT:

To phrase this a different way...is there a reason I should want to limit the number of tasks created/queued/executed simultaneously given that I could have a really large number e.g. 2000. Does the task scheduler optimally schedule tasks, can't seem to find any info on how it actually works...

EDIT:

I'm not using Parallel.Foreach. I've decided to use counters and based on a max number of counters, and the current number of tasks, create tasks or wait until the max number is not exceeded.

newbie_86
  • 4,520
  • 17
  • 58
  • 89
  • Search for topics by `MaxDegreeOfParallelism`. Seems like it was discussed many times – Gennady Vanin Геннадий Ванин Apr 15 '13 at 10:11
  • Thanks. I did come across those posts but it seems you need to specify the number of cores, I don't know explicitly how many cores or how many tasks can run simultaneously without causing a bottleneck...and it seems parallel runs use blocking as well? – newbie_86 Apr 15 '13 at 10:16
  • 1
    Could you explain what exactly are you trying to do? Why do you have so many `Task`s, are you processing some collection? If you do, would `Parallel.ForEach()` or PLINQ work for you? – svick Apr 15 '13 at 11:32
  • “I'm not using Parallel.Foreach.” Could you explain why not? It *looks* like it's exactly what you need, but it's hard to say if you don't tell us what that actually is. – svick Apr 15 '13 at 15:01
  • @svick From what i've read about parallel.foreach (and i could be wrong) it blocks all processing until every task is complete. I need to create tasks too send out around 2000 emails, I really dont want it to block any further processing. Also, i'm trying to control the number of threads that gets spawned. I want some sort of polling to happen, so it only creates x number of threads, if the max is exceeded it waits for a free one. – newbie_86 Apr 15 '13 at 16:21
  • Can you send emails asynchronous? That way your thread can be used for other things and won't be blocked for the network I/O. – Wouter de Kort Apr 15 '13 at 19:43
  • @WouterdeKort I think it makes sense to go one step at a time, so I probably wouldn't recommend going async now (whether using the new `async`-`await` or “the old way”). While async tends to be more efficient, it makes things like limiting the degree of parallelism harder. – svick Apr 15 '13 at 20:06
  • @svick a truly async solution would require less threads. Wouldn't this make the degree of parallelism less important? – Wouter de Kort Apr 16 '13 at 11:08
  • @WouterdeKort Not at all, at least in this case. The degree of parallelism here isn't important because of memory consumption (which async can decrease). It's important to get the highest throughput from the network and decreasing the number of threads won't help you with that. – svick Apr 16 '13 at 11:27
  • @svick an async solution uses an I/O port by the kernel to be notified when a network I/O would be finished. In the mean time, the thread can do other things (like sending another email). 2000 threads would involve so much context switching and the only thing the threads are doing is waiting on the I/O to finish. So you would have % CPU with 2000 threads which is a waste of resources and performance. Parallelism isn't the solution for dealing with I/O bound tasks, only for CPU bound tasks. – Wouter de Kort Apr 16 '13 at 11:50
  • @WouterdeKort I certainly didn't mean having 2000 threads, more like 10 or even less (depending on the network connection; see my answer). And even if you had 2000 threads, there wouldn't be much context switching, because the threads would be blocked most of the time. So there wouldn't be much CPU wasted (but there would be a lot of memory wasted). – svick Apr 16 '13 at 12:35
  • I think my question is related with yours. http://stackoverflow.com/questions/21052853/keep-running-a-specific-number-of-tasks – Lu1zZz Jan 17 '14 at 18:15

2 Answers2

2

If you want to send 2000 emails, you don't want to block the current thread and you want to use only a limited number of threads, then I think you should use a single Task that calls Parallel.ForEach(). Something like:

List<Email> emails = …;

var task = Task.Factory.StartNew(() =>
{
    Parallel.ForEach(
        emails,
        new ParallelOptions { MaxDegreeOfParallelism = maxNumberOfThreads },
        email => email.Send());
});

// handle the completion of task here

If you don't know the best number of threads, then you will have to find it out experimentally. TPL is able to guess the best number of threads for CPU-bound computations reasonably well. But your code is not CPU-bound, it's network-bound, which means the optimal number of threads has nothing to do with the number of cores your CPU has, it depends on the bandwidth and latency of your network connection (and possibly also of your mail server).

svick
  • 236,525
  • 50
  • 385
  • 514
1

Replying to question in comment:

"I don't know explicitly how many cores..."

You can get the number of cores with Environment.ProcessorCount and use them to set MaxDegreeOfParallelism:

ParallelOptions po = new ParallelOptions
{ 
    MaxDegreeOfParallelism = Environment.ProcessorCount
};  

Yes, one might frequently want to limit or tune the default spawning of new threads provided by parallel tasks (.NET thread pool). There are a lot of such discussions:

Community
  • 1
  • 1