1

I am about to implement a Pool of Tasks where there should be at most n Tasks run in paralell.

New jobs can come in from any thread and use the next free task. If no tasks are available it should wait until one is free. When all work is done, the tasks should "sleep" to not use CPU time.

Is there something like this already available in C# / .NET?

Thanks!

oleole
  • 207
  • 2
  • 10
  • 2
    That's the job of the TaskScheduler, not a Task Pool. Tasks aren't threads, they run on threads that come from a thread pool. The TaskScheduler controls how tasks run on threads. There *are* examples of custom TaskSchedulers in the docs and example projects like the [Parallel Extensions Extras](https://blogs.msdn.microsoft.com/pfxteam/2010/04/04/a-tour-of-parallelextensionsextras/), but you don't need them to do what you describe. That's how tasks work in the first place - they don't waste resources if they aren't running. There's no need for "free" tasks – Panagiotis Kanavos Feb 12 '19 at 09:39
  • 1
    Simplest solution will be to use semaphore to control spawn of the tasks. I don't think you should care about pooling by yourself as it is already provided for you... Agree with @PanagiotisKanavos that `TaskScheduler` could be a solution as well. – Johnny Feb 12 '19 at 09:41
  • 1
    @Johnny the simplest solution would be to do nothing. Tasks aren't threads. They don't need to be pooled or throttled, it's the *threads* that need throttling. That's what the TaskScheduler does. Higher-level libraries like PLINQ and Dataflow provide explicit options for throttling/DOP – Panagiotis Kanavos Feb 12 '19 at 09:41
  • @PanagiotisKanavos the part of my application that should use this pool of tasks/threads is loading resources from disk or network. and i would like to limit that to a certain number of paralell tasks/threads. – oleole Feb 12 '19 at 09:45
  • @oleole no, tasks are *not threads*. They *don't* execute by themselves, they *don't* need to be pooled. It's the *threads* that need pooling. Explain what you want to do. Most likely it's already available. – Panagiotis Kanavos Feb 12 '19 at 09:45
  • I mean the one **built in to .NET**. The one you have always used. The `ThreadPool`. What makes you think it won't work for your needs? What profiling did you do to identify the need to write your own implementation and not just trust the existing (well tested) one? – mjwills Feb 12 '19 at 09:46
  • 2
    @oleole that doesn't need any kind of pool. What *exactly* do you want to do? The easiest way to process 1000 URLs or paths for example would be to create an ActionBlock with a DOP of eg 10 and start posting URLs to it. It will download the URLs concurrently, 10 at a time – Panagiotis Kanavos Feb 12 '19 at 09:46
  • 1
    @oleole if you wanted to process a lot of data with PLINQ but *not* use all cores, you could use `AsParallel().WithDegreeOfParallelism(..)` and specify how many cores you want – Panagiotis Kanavos Feb 12 '19 at 09:47
  • `is loading resources from disk or network.` Please show us some sample code in your question. You likely **don't** need to spin up threads / tasks to do that. – mjwills Feb 12 '19 at 09:48
  • @oleole if you wanted to generate thumbnails from 1000 big images you could use a TransformBlock to load the contents of each path and an ActionBlock that takes the buffer, converts it and saves it. Each block could have its own DOP – Panagiotis Kanavos Feb 12 '19 at 09:48
  • 1
    @oleole if you don't want to block during IO though, you don't want parallelism in the fist place. You need to use async IO methods that won't block the calling thread until they complete – Panagiotis Kanavos Feb 12 '19 at 09:49
  • 1
    Loading resources from disk or network is **I/O**. You shouldn't want to be dedicating Tasks/Threads to these sort of tasks - arrange for them to happen asynchronously and *don't* block threads whilst waiting for them to finish – Damien_The_Unbeliever Feb 12 '19 at 09:49
  • @PanagiotisKanavos yes, that is exactly what i am looking for. that ActionBlock you mentioned looks promising. I will give it a try. – oleole Feb 12 '19 at 09:50
  • Check how https://stackoverflow.com/a/35691331/34092 uses `SemaphoreSlim`. That is likely what you should do. – mjwills Feb 12 '19 at 09:53

2 Answers2

0

I would take a look at the Action blocks in the TPL Dataflow Library that will give you your desired functionality. I use it all the time in production environments with even more complex requirements that you have specified.

Bigtoe
  • 3,372
  • 1
  • 31
  • 47
0

You can easily use the ConcurrentExclusiveSchedulerPair for it. Here is the example code:

// max 3 parallel tasks
var schedulerPair = new ConcurrentExclusiveSchedulerPair(TaskScheduler.Default, 3);
var factory = new TaskFactory(schedulerPair.ConcurrentScheduler);

Starting a new Task should be done with the following code:

factory.StartNew(() => create task here).Unwrap()

The full example:

static async Task Main(string[] args)
{
    var schedulerPair = new ConcurrentExclusiveSchedulerPair(TaskScheduler.Default, 3);
    var factory = new TaskFactory(schedulerPair.ConcurrentScheduler);
    var tasks = new List<Task>();
    Random r = new Random();
    for (int i = 0; i < 5; i++)
    {
        var localI = i;
        var ts = TimeSpan.FromMilliseconds(1000 + r.Next(0, 10) * 100);
        var task = factory.StartNew(() => RunTask(localI, ts)).Unwrap();
        tasks.Add(task);
    }
    await Task.WhenAll(tasks);
}

static object mutex = new object();
static int numberOfParallelWorkers;

static async Task RunTask(int n, TimeSpan delay)
{
    for (int i = 0; i < 2; i++)
    {
        int nw;
        lock (mutex) { nw = numberOfParallelWorkers = numberOfParallelWorkers + 1; }
        var start = DateTime.Now;
        Console.WriteLine($"Started task #{n} part {i} at {start:ss.ff}, tasks: {nw}");
        Thread.Sleep(delay); // simulate CPU-bound work
        lock (mutex) { nw = numberOfParallelWorkers = numberOfParallelWorkers - 1; }
        var end = DateTime.Now;
        Console.WriteLine($"Finished task #{n} part {i} at {end:ss.ff}, parallel: {nw}");
    }
    await Task.Yield();
}

produces the following output:

Started task #1 part 0 at 43.98, parallel: 2
Started task #0 part 0 at 43.98, parallel: 1
Started task #2 part 0 at 43.98, parallel: 3
Finished task #0 part 0 at 45.09, parallel: 2
Started task #0 part 1 at 45.09, parallel: 3
Finished task #1 part 0 at 45.29, parallel: 2
Started task #1 part 1 at 45.29, parallel: 3
Finished task #2 part 0 at 45.59, parallel: 2
Started task #2 part 1 at 45.59, parallel: 3
Finished task #0 part 1 at 46.19, parallel: 2
Started task #3 part 0 at 46.19, parallel: 3
Finished task #1 part 1 at 46.59, parallel: 2
Started task #4 part 0 at 46.59, parallel: 3
Finished task #2 part 1 at 47.19, parallel: 2
Finished task #4 part 0 at 47.59, parallel: 1
Started task #4 part 1 at 47.59, parallel: 2
Finished task #3 part 0 at 47.69, parallel: 1
Started task #3 part 1 at 47.69, parallel: 2
Finished task #4 part 1 at 48.59, parallel: 1
Finished task #3 part 1 at 49.19, parallel: 0
Vlad
  • 35,022
  • 6
  • 77
  • 199