0

I have HTTP client which basically invokes multiple web requests against HTTP server. And I execute each HTTP request in a thread pool thread (synchronous call), and by default uses 30 TCP (using httpwebrequest.servicepoint - http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.servicepoint.aspx ). And based on the system i am managing, there can be ~500/1000 thread pool threads waiting for I/O (http response)

Now, I am wondering do I need to limit the number of threads I use as well? (for ex, http://msdn.microsoft.com/en-us/library/ee789351(v=vs.110).aspx System.Threading.Tasks - Limit the number of concurrent Tasks )

EDIT

Yes, I think I need to, limit the number of threads I use as even though these threads are in wait state they take up resources. This way I can control number of resources/threads I use up which makes it easier for my component to be integrated with others without causing them for starvation/contention for resource/threads.

EDIT 2

I have decided to completely embrace async model so that i won't be using thread pool threads to execute http requests, rather I can simply rely on "collaboration of OS Kernel and I/O completion port thread(s)" which will ensure that upon completion response will be sent in a callback (this way i can best use of cpu as well as resource). I am currently thinking of using (webclient.uploaddatataskasync) http://msdn.microsoft.com/en-us/library/system.net.webclient.uploaddatataskasync(v=vs.110).aspx, and update the code accordingly. (couple of references for details: HttpWebRequest and I/O completion ports, How does .NET make use of IO Threads or IO Completion Ports? )

EDIT 3

Basically i have used "async network I/O .net APIs as mentioned above" which essentially removed usage of my parallel library. For details, please see the below answer (i have added it for convenience, just in case if anyone is interested!).

psuedo code to give an idea how I am invoking web requests using webclient

//psudeo code to represents there can be varibale number of requests
//these can be ~500 to ~1000
foreach(var request in requests)
{
    //psudeo code which basically executes webrequest in threadpool thread
    //MY QUESTION: Is it OK to create as many worker threads as number rrequests
    //and simply let them wait on a semaphore, on should i limit the concurrency?
    MyThreadPoolConcurrentLibrary.ExedcuteAction(() =>
        {            
            var sem = new Semaphore(initialCount: 50, maximumCount: 50.Value);
            try
            {
                //using semaphore as the HTTP Server which i am taking to recommend 
                //to send '50' parallel requests in '30' TCP Connections
                sem.WaitOne();
                //using my custom webclient, so that i can configure 'tcp' connections 
                //(servicepoint connection limit) and ssl validation etc.
                using (MyCustomWebClient client = new MyCustomWebClient())
                {
                    //http://msdn.microsoft.com/en-us/library/tdbbwh0a(v=vs.110).aspx
                    //basically the worker thread simply waits here
                    client.UploadData(address: "urladdress", data: bytesdata);
                }
            }
            finally
            {
                sem.Release(1);
            }
        });
}
MyThreadPoolConcurrentLibrary.WaitAll(/*...*/);

Basically should I do something to limit the number of threads I consume, or let the thread pool take care of it (i.e. in case if my app reaches thread pool's maximum thread limit, it any way queues the request - so I can simply rely on it)

*pseudo code which should show my custom webclient where I configure tcp connections, ssl validation etc.

class MyCustomWebClient : WebClient
{
    protected override WebRequest GetWebRequest(Uri address)
    {
        HttpWebRequest request = (HttpWebRequest)base.GetWebRequest(address);            
        request.KeepAlive = true;
        request.Timeout = 300;
        request.ServicePoint.ConnectionLimit = TCPConnectionsLimit;
        request.ServerCertificateValidationCallback = this.ServerCertificateValidationCallback;
        return request;

    }
    private bool ServerCertificateValidationCallback(object sender, System.Security.Cryptography.X509Certificates.X509Certificate certificate, System.Security.Cryptography.X509Certificates.X509Chain chain, System.Net.Security.SslPolicyErrors sslPolicyErrors)
    {
        throw new NotImplementedException();
    }
}

Best Regards.

Community
  • 1
  • 1
Dreamer
  • 3,371
  • 2
  • 34
  • 50
  • 3
    Why are you not using asynchronous IO instead of thread-pool threads? – Peter Ritchie Jun 18 '14 at 19:54
  • 2
    Stated another way, convert to UploadDataAsync and scale much better. http://msdn.microsoft.com/en-us/library/ms144225(v=vs.110).aspx – EricLaw Jun 18 '14 at 21:22
  • The odds that this web server is just as likely as you to start a thousand threads to service your needs are fairly low. If it does then don't expect it to last very long, web server admins are pretty good at shutting out resource hogs. Clearly you need to throttle *connections* instead of threads. And web server admins are typically pretty adamant that the perfect number is *one*. – Hans Passant Jun 18 '14 at 21:35
  • Why not use Tasks instead? `var task = new Task(() => { /* do work */ }).Start()`. Then you can await task or return task.Result. Threads are so unnecessary these days with all the work in async/await unless it falls under a few special cases. – Tacoman667 Jun 18 '14 at 22:19
  • @Hans: Please see the updated code - basically I won't bombard the server with all reqs, I limit them by using sem (75 parallel res ok as per HTTP server team)(initially didn't show for brevity). And I use persistent tcp connections, by default uses 30 tcp connections - all these default numbers are recommended from HTTP server team. Basically my question is simple: is it OK to dedicate as many worker (threadpool) threads as per requests, and simply let them wait on semaphore, OR should I limit the concurrency (number of threads I use), for ex, using task schedule as mentioned in my post... – Dreamer Jun 18 '14 at 22:41
  • @Tacoman, my concurrent library basically uses threadpool threads - I don't create threads. I can easily update my library to use tasks as well. but, I think once task is started a worker thread will be invoked - essentially same problem as I mentioned (w.r.t number of threads) - i.e. number of tasks directly proportional to number of worker threads - and I can limit the concurrency using taskscehduler. So, QUESTION is when should I limit the number of threads I use?or, is it ok and let simply rely on threadpool to queue my actions/tasks... – Dreamer Jun 18 '14 at 22:46
  • @EricLaw - wow, is it you :)? I use your fiddler a lot, which is really nice tool!! - let me think through your suggestion, looks like I can use async/completed variation to improve what I am doing here - btw, do u see any issue of the way I am doing (synchronously) as I am making lot of threads to block for my code. – Dreamer Jun 18 '14 at 22:53
  • @EricLaw, btw, does uploaddataasync does not invoke a separate worker thread and returns immediates? if yes, it basically defeats the whole purpose - I think at this state I should take closer look at existing async models, and basically may need to refactor my code...Regards. – Dreamer Jun 18 '14 at 23:15
  • Task Parallel Lib will work through the current threadpool on your behalf. `Parallel.ForEach()` or `Parallel.Invoke()` might make things easier and cleaner. http://msdn.microsoft.com/en-us/library/dd992001(v=vs.110).aspx – Tacoman667 Jun 19 '14 at 12:29
  • If you think that properly using async "defeats the whole purpose" then you need to restate what you think your "purpose" actually is. Using async is the right way to code this sort of thing. – EricLaw Jun 19 '14 at 15:32
  • @Eric: no I meant, if webclient.uploaddataasync(...) immediately returns, but internally blocks on a worker thread then it defeats my purpose - as my whole intention is to 'reduce number' of threads. I still didn't get a chance to look - but my hope/guess is the request is simply sent to buffer, there will be no thread waiting, and upon response a worker thread will return the result. But, am still curious how 'timeout' will happen - I will be looking closing soon, to get more details. Basically gist is if calling async internally dedicates another thread, it 'defeats the purpose of this qtn' – Dreamer Jun 19 '14 at 18:51
  • 1
    You should read this: http://msdn.microsoft.com/en-us/library/hh191443.aspx – EricLaw Jun 19 '14 at 20:03
  • @Eric, Thanks Eric, I will definitely go through it. Regards. – Dreamer Jun 19 '14 at 20:15
  • understood - basically I/O completion port thread will give response in a callback. there will be no worker thread waiting in async calls - so decided to embrace async api completely in all layers of my component so that i can take full advantage of it. – Dreamer Jun 22 '14 at 04:53
  • @Peter, thanks for the comment - now i got it. when i first saw ur comment i was confused that you are referring to I/O like file, or disk access. Now i understand its used for generic purpose (disk, network, database, tcp socket, named pipe etc.). http://msdn.microsoft.com/en-us/library/windows/desktop/aa365198(v=vs.85).aspx (basically i will be changing my code as per you and Eric's comments). Thank you. – Dreamer Jun 22 '14 at 05:07
  • @Dreamer the use of completion ports is hidden/abstracted by .NET. So, it's common to describe that as "asynchronous IO" in .NET – Peter Ritchie Jun 22 '14 at 13:43

2 Answers2

0

Since I am performing network I/O (http web requests), it is not good idea use 'synchronous' httpwebrequests and let the thread pool threads to block in sync calls. So, i have used 'async network i/o operations (web client's async task methods) as mentioned above in the question as per the suggestions from comments. It automatically removed usage of number of threads in my component - for details, please see below psudeo code snippet...

Here are some useful links that helped me to adapt to few of c# 5.0 async concepts easily (async/await):

Deep Dive Video (good explanation of async/await state machine) http://channel9.msdn.com/events/TechDays/Techdays-2014-the-Netherlands/Async-programming-deep-dive

http://blog.stephencleary.com/2013/11/there-is-no-thread.html

async/await error handling: http://www.interact-sw.co.uk/iangblog/2010/11/01/csharp5-async-exceptions , http://msdn.microsoft.com/en-us/library/0yd65esw.aspx , How to better understand the code/statements from "Async - Handling multiple Exceptions" article?

Nice book: http://www.amazon.com/Asynchronous-Programming-NET-Richard-Blewett/dp/1430259205

class Program
{        
    static SemaphoreSlim s_sem = new SemaphoreSlim(90, 90);
    static List<Task> s_tasks = new List<Task>();   
    public static void Main()
    {                     
        for (int request = 1; request <= 1000; request++)
        {                             
            var task = FetchData();
            s_tasks.Add(task);                
        }
        Task.WaitAll(s_tasks.ToArray());
    }        
    private static async Task<string> FetchData()
    {
        try
        {
            s_sem.Wait();
            using (var wc = new MyCustomWebClient())
            {
                string content = await wc.DownloadStringTaskAsync(
                    new Uri("http://www.interact-sw.co.uk/oops/")).ConfigureAwait(continueOnCapturedContext: false);
                return content;
            }
        }
        finally
        {
            s_sem.Release(1);
        }
    }
    private class MyCustomWebClient : WebClient
    {
        protected override WebRequest GetWebRequest(Uri address)
        {
            var req = (HttpWebRequest)base.GetWebRequest(address);
            req.ServicePoint.ConnectionLimit = 30;
            return req;
        }
    }
}

Regards.

Community
  • 1
  • 1
Dreamer
  • 3,371
  • 2
  • 34
  • 50
-2

You could always simply aim for the same limit that browsers run under. That way the server admins can't really hate on you too much.

Now, the RFC says that you should limit connections to 2 pr domain, but according to http://www.stevesouders.com/blog/2008/03/20/roundup-on-parallel-connections/

many browsers go as high as 6 or 8 parallel connections (and this was in 2008).

Browser HTTP/1.1    HTTP/1.0
IE 6,7         2    4
IE 8           6    6
Firefox 2      2    8
Firefox 3      6    6
Safari 3,4     4    4
Chrome 1,2     6    ?
Chrome 3       4    4
Chrome 4+      6    ?
iPhone 2       4    ?
iPhone 3       6    ?
iPhone 4        4   ?
Opera 9.63,     4   4
Opera 10.51+    8   ?
Soraz
  • 6,610
  • 4
  • 31
  • 48
  • thanks for the reply. Please note that I already took care from server side - basically I won't issue all the request straight away and expect result. I limit the number of parallel requests I am making by using semaphore and I also use ~30 default tcp connections. My question is simple: in my usage of webclient, can I simply let 1000 thread pool threads simply wait on/suspended state? or should I limit the number of worker threads I use by using task scheduler? – Dreamer Jun 18 '14 at 22:56