0

For a project we need to code a windows service that will process a job queue and do the necessary processing and calls to different APIs, and it will be running on a stand-alone machine.

Context: This service will poll the SQL database every x (ex: 5) seconds, get the top Y jobs according to the priority and creation date, and then start processing them. We expect huge volume for this windows service so we want to make it multi-thread or asynchronous. If the maximum amount of parallel processing is reached, we do not want it to launch more threads. But at the same time, if we have 7 jobs taking 30seconds and one taking 5 minutes, we don't want it to wait for the 5minute job to finish before looping and starting another batch of 8.

The first option we looked for were the BackgroundWorker. Every iteration of the timer will check the status of each BackgroundWorker, and if available will instruct it to process a new job. But with the newer versions of the .Net framework those are obsolete with the async method.

The second option we looked for was the Parallel.ForEach and awaitAll. But the problem is if 7 of the 8 threads take 1minute but the last one takes 6minutes, we do not want to wait for the last one to finish before starting 7 new job process.

The most appropriate way to do it would be tasks.

My questions are: is there a way to track the status of multiple tasks running? And to limit the number of tasks running simultaneously? Should I instanciate all my tasks in the OnStart() of the service, and each OnElapsed of my timer, check the status of each tasks and if available launch it again with a new job? Or am I completely wrong on how tasks work?

The number of allowed parallel processing will be defined in an app config file and initialized in the OnStart of the Windows service.

Fynnen
  • 63
  • 1
  • 6
  • You need to look into the Task Parallel Library and/or semaphores. What you are wanting is very possible but you will have to write the heavy lifting. I would advise against background workers. With TPL you can chain continuation tasks which are very powerful. – Botonomous Oct 07 '16 at 19:15
  • I'd look at QuartzScheduler for this. I have had to solve a similar problem, but because we were kicking off a lot of different external programs, I ended up rolling a library for it. Quartz should do everything you need to, and is configurable using CRON expressions making it very flexible. http://www.quartz-scheduler.org/documentation/quartz-2.1.x/examples/ – Alex Oct 07 '16 at 19:16
  • Something like this https://msdn.microsoft.com/en-us/library/ee789351(v=vs.100).aspx ? [This one](http://stackoverflow.com/questions/18771524/limit-the-number-of-tasks-in-task-factory-start-by-second) is not the same but may help you to write your own *TaskScheduler* .... – L.B Oct 07 '16 at 19:16
  • Confirming what @Botonomous said, BackgroundWorker will give you no end of hell doing this. Also, never use a C# `Thread` in a windows service. You will see a 30% increase in CPU usage unless you have some careful pooling logic to regulate the thread polling. – Alex Oct 07 '16 at 19:17
  • @L.B Very interesting. I haven't seen a use of a custom scheduler before and I really like this. – Botonomous Oct 07 '16 at 19:21
  • @Botonomous I will look into it! By chain continuation do you mean that we wouldn't need to wait for all the task to end before launching another set of tasks? – Fynnen Oct 07 '16 at 19:23
  • @Fynnen I mean that each finished task can have logic that will run directly after they finish. Its just a great feature. In your question, yes you can start a task and you don't have to wait for the result before starting a new task. – Botonomous Oct 07 '16 at 19:26
  • @Botonomous Perfect, thank you. The syntax for tasks is for some reason not intuitive for me, and when I saw the examples where they loop for x times and start a process everytime I assumed that at the end it was waiting for all the tasks to finish, kind of like the "await" for async. So the best way would be that at each loop of my timer, I check if I have less than the max number of tasks allowed, and if yes I create them, if not do nothing? – Fynnen Oct 07 '16 at 19:30
  • @flynnen That is an approach you can use, i would just use a while loop that checks my runningTask count (you would need to maintain that #). Also to get comfortable with Tasks I reccomend learning Action, Func and Lambda's really well. They look scary but are fairly straight forward. – Botonomous Oct 07 '16 at 19:33
  • @Botonomous Yes I will read on those subject, I found this link that looks very useful https://msdn.microsoft.com/en-us/library/dd537609(v=vs.110).aspx – Fynnen Oct 07 '16 at 19:39
  • @flynnen That is a great article. One ive read many times. Good luck. – Botonomous Oct 07 '16 at 19:40

1 Answers1

0

One possible way to do this would be to make use of the TPL Dataflow library. You can configure a block (ActionBlock in this case) to handle the processing and send items into the block asynchronously (or synchronously if you want to). You can limit the max degrees of parallelism you want to allow, as well as set a capacity to throttle the number of items in the buffer.

By using SendAsync, you can allow your process to asynchronously wait until there's room in the buffer for new items. Using the TPL Dataflow library this is all fairly simple:

var processBlock = new ActionBlock<DataItem>(
    item => ProcessItemAsync(item),
    new ExecutionDataflowBlockOptions
    {
        BoundedCapacity = 10,   // your configured limit
        MaxDegreeOfParallelism = 2 // your configured max
    });

To use your block, you would do something like:

using (var connection = new SqlConnection(...))
{
    while (await reader.ReadAsync().ConfigureAwait(false))
    {
        // offer a message to the processing block, but allow postponement
        // in case we've hit capacity
       await processBlock.SendAsync(item);
    }
}
Eugene Pawlik
  • 1,170
  • 1
  • 15
  • 21