2

I have ~500 tasks, each of them takes ~5 seconds where most of the time is wasted on waiting for the remote resource to reply. I would like to define the number of threads that should be spawned myself (after some testing) and run the tasks on those threads. When one task finishes I would like to spawn another task on the thread that became available.

I found System.Threading.Tasks the easiest to achieve what I want, but I think it is impossible to specify the number of tasks that should be executed in parallel. For my machine it's always around 8 (quad core cpu). Is it possible to somehow tell how many tasks should be executed in parallel? If not what would be the easiest way to achieve what I want? (I tried with threads, but the code is much more complex). I tried increasing MaxDegreeOfParallelism parameter, but it only limits the maximum number, so no luck here...

This is the code that I have currently:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication1
{
    class Program
    {
        private static List<string> _list = new List<string>();
        private static int _toProcess = 0;

        static void Main(string[] args)
        {   
            for (int i = 0; i < 1000; ++i)
            {
                _list.Add("parameter" + i);
            }

            var w = new Worker();
            var w2 = new StringAnalyzer();

            Parallel.ForEach(_list, new ParallelOptions() { MaxDegreeOfParallelism = 32 }, item =>
            {
                ++_toProcess;
                string data = w.DoWork(item);
                w2.AnalyzeProcessedString(data);
            });

            Console.WriteLine("Finished");           
            Console.ReadKey();
        }

        static void Done(Task<string> t)
        {            
            Console.WriteLine(t.Result);
            --_toProcess;
        }
    }

    class Worker
    {
        public string DoWork(string par)
        {
            // It's a long running but not CPU heavy task (downloading stuff from the internet)
            System.Threading.Thread.Sleep(5000);            
            return par + " processed";
        }
    }

    class StringAnalyzer
    {
        public void AnalyzeProcessedString(string data)
        {
            // Rather short, not CPU heavy
            System.Threading.Thread.Sleep(1000);
            Console.WriteLine(data + " and analyzed");
        }
    }
}
Michal B.
  • 5,676
  • 6
  • 42
  • 70
  • 3
    Using more tasks than you have processor cores will usually SLOW IT DOWN. That's why it's being limited. I guess it's limited to 8 because you have quad core + hyperthreading for 8 logical cores. – Matthew Watson Feb 26 '14 at 08:50
  • 4
    IMHO, you have a design problem. It's a bad idea to solve this problem by creating more threads. You should make use of I/O completion ports. – Fedor Feb 26 '14 at 08:53
  • 5
    @MatthewWatson *"most of the time is wasted on waiting for the remote resource to reply"*, So I don't think using more threads than number of cores would be problem here (as long as remote server is happy with it :) ) – L.B Feb 26 '14 at 08:53
  • @MatthewWatson, I agree with what you are saying, but the keyword in your sentence is **usually**. Just check my code and write it in the way that it performs the fastest. I am quite sure that I can run `DoWork` and `AnalyzeProcessedString` for all 1000 tasks under 10 seconds. Can you do that with 8 threads? I do not think so. Who upvotes comments like that? :S – Michal B. Feb 26 '14 at 08:53
  • @Fyodor: that's an interesting finding! I will look into that, thanks and have an upvote! – Michal B. Feb 26 '14 at 08:56
  • 2
    @MichalB. Creating tasks that spend most of their time waiting is a big no-no. You should never do that. A different design is required - as Fyodor says, IO Completion Ports is a likely solution. But creating a whole load of threads is really a very bad idea. – Matthew Watson Feb 26 '14 at 08:56
  • @MatthewWatson: what should I do in your opinion? I only see Fyodor here adding something useful to the discussion. – Michal B. Feb 26 '14 at 08:57
  • 1
    @MichalB. It's useful to tell you not to do something that is wrong, is it not? It will at least steer you away from a bad solution. – Matthew Watson Feb 26 '14 at 08:58
  • It is useful, but since you are so smart, then suggest a solution. Otherwise it looks like trying to show off based on what other people say. Nice editing btw. ;-) Cheerios! – Michal B. Feb 26 '14 at 08:59
  • 1
    @MichalB. I'm not sure why you are being so defensive... My first comment was merely pointing out why .Net limits the maximum number of threads... As for the solution, well I already said that I agree with Fyodor - you should use IO Completion Ports. But without further details, it's hard to be more specific. – Matthew Watson Feb 26 '14 at 09:03
  • Here's a useful link, though - might be of some help: http://msdn.microsoft.com/en-us/library/windows/desktop/aa365198%28v=vs.85%29.aspx – Matthew Watson Feb 26 '14 at 09:05
  • @MatthewWatson: because I don't like people like that. They come, they say stuff without even suggesting a solution. Such information makes no use for the poster. At least not for me. Fyodor mentioned IO Completions Ports, which is an interesting finding, but now I google and there is almost nothing for .NET. I found some 3rd party library, but if it so much more useful to use the concept of IO Completion Ports instead of Threads, then why isn't it a standard thing in .NET? Any examples on how to implement them and explanation what makes them better than normal threads? – Michal B. Feb 26 '14 at 09:09
  • @MatthewWatson: Thanks, I found this link too. Those are native WIN32 API functions. Would you seriously go that far to implement what I need? If I go down this path my code will become quite complex and possibly buggy. – Michal B. Feb 26 '14 at 09:10
  • @MichalB. No, that's background information. Hang on a sec while I try to find a link showing what to use in C#. In current C# you would use the `await` keyword with some high-level constructs. Which version of C#/.Net are you using? – Matthew Watson Feb 26 '14 at 09:14
  • @MatthewWatson: the only things I find on the Internet is that threads are used for completing multiple tasks whose execution time is not entirely dependent on the CPU. So I have a good reason to use threads... – Michal B. Feb 26 '14 at 09:15
  • "Deliberately creating more threads than processors is a standard technique used to make use of "spare cycles" where a thread is blocked waiting for something, whether that's I/O, a mutex, or something else by providing some other useful work for the processor to do." from http://stackoverflow.com/questions/5987376/why-is-having-more-threads-than-cores-faster – Michal B. Feb 26 '14 at 09:17
  • Here's a better link: http://marcgravell.blogspot.co.uk/2009/02/async-without-pain.html Also have a look at this article by Jeffrey Richter (especially his comments in the "Performance" section): http://msdn.microsoft.com/en-us/magazine/cc163726.aspx – Matthew Watson Feb 26 '14 at 09:18

2 Answers2

6

Assuming you can use native async methods like HttpClient.GetStringAsync while getting your resource,

int numTasks = 20;
SemaphoreSlim semaphore = new SemaphoreSlim(numTasks);
HttpClient client = new HttpClient();

List<string> result = new List<string>();
foreach(var url in urls)
{
    semaphore.Wait();

    client.GetStringAsync(url)
          .ContinueWith(t => {
              lock (result) result.Add(t.Result);
              semaphore.Release();
          });
}

for (int i = 0; i < numTasks; i++) semaphore.Wait();

Since GetStringAsync uses IO Completions Ports internally (like most other async IO methods) instead of creating new threads, this can be the solution you are after.

See also http://blog.stephencleary.com/2013/11/there-is-no-thread.html

L.B
  • 114,136
  • 19
  • 178
  • 224
  • Internally I am using HttpWebRequest, but I now see that it has asynchronous methods too (BeginGetRequestStream and BeginGetResponse). Again something I missed :-). Thanks! – Michal B. Feb 26 '14 at 09:20
  • 1
    @MichalB. You can also use `await request.GetResponseAsync();` which makes it easier. Here's a useful sample from Microsoft: http://msdn.microsoft.com/en-us/library/hh300224.aspx These use IO-completion ports as their implementation (but it is not exposed to you, which is why you don't see much explicit stuff about completion ports in the C# documentation). Also note that "overlapped IO" uses completion ports. – Matthew Watson Feb 26 '14 at 09:31
  • 1
    @MatthewWatson I my answer, `await` will defeat the purpose of running `numTasks` simultaneously. See the code, I am not waiting/awaiting if there is less task(download) then `numTasks` in progress. – L.B Feb 26 '14 at 09:37
  • @L.B Yes, but I think it's better to use the await as per the example in the Microsoft link that I posted. Then you avoid creating a load of threads. – Matthew Watson Feb 26 '14 at 09:38
  • 1
    @MatthewWatson See the link in the answer. No threads... – L.B Feb 26 '14 at 09:39
  • 1
    Ah yes, good point. But I think the Microsoft examples do show a good general approach. – Matthew Watson Feb 26 '14 at 09:40
0

As L.B mentioned, .NET Framework has methods that performs I/O operations (requests to databases, web services etc.) using IOCP internally, they can be recognized by their names - it ends with Async by convention. So you could just use them to build robust scalable applications that can process multiple requests simultaneously.

EDIT: I've completely rewritten the code example with the modern best practices so it becomes much more readable, shorter and easy to use.

For the .NET 4.5 we can use approach:

class Program
{
    static void Main(string[] args)
    {
        var task = Worker.DoWorkAsync();
        task.Wait(); //stop and wait until our async method completed

        foreach (var item in task.Result)
        {
            Console.WriteLine(item);
        }

        Console.ReadLine();
    }
}

static class Worker
{
    public async static Task<IEnumerable<string>> DoWorkAsync()
    {
        List<string> results = new List<string>();

        for (int i = 0; i < 10; i++)
        {
            var request = (HttpWebRequest)WebRequest.Create("http://microsoft.com");
            using (var response = await request.GetResponseAsync())
            {
                results.Add(response.ContentType);
            }
        }

        return results;
    }
}

Here is the nice MSDN tutorial about async programming using async-await.

Community
  • 1
  • 1
Fedor
  • 1,548
  • 3
  • 28
  • 38