1

I am working on a project which uses a timed web client. Class structure is like this.

Controller => Main supervisor of class Form1, SourceReader, ReportWriter, UrlFileReader, HTTPWorker, TimedWebClient.

HTTPworker is the class to get the page source when the url is given. TimedWebClient is the class to handle the timeout of the WebClient. Here is the code.

class TimedWebClient : WebClient
{
    int Timeout; 

    public TimedWebClient()
    {
        this.Timeout = 5000;
    }


      protected override WebRequest GetWebRequest(Uri address)
    {
        var objWebRequest = base.GetWebRequest(address);
        objWebRequest.Timeout = this.Timeout;
        return objWebRequest;
    }
}

In HTTPWorker i have

 TimedWebClient wclient = new TimedWebClient();
 wclient.Proxy = WebRequest.GetSystemWebProxy();
 wclient.Headers["Accept"] = "application/x-ms-application, image/jpeg, application/xaml+xml, image/gif, image/pjpeg, application/x-ms-xbap, application/x-shockwave-flash, application/vnd.ms-excel, application/vnd.ms-powerpoint, application/msword, */*";
 wclient.Headers["User-Agent"] = "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; MDDC)";
 string pagesource = wclient.DownloadData(requestUrl);
 UTF8Encoding objUTF8 = new UTF8Encoding();
 responseData = objUTF8.GetString(pagesource);

I have handled exceptions there. In Form1 i have a background controller and a urllist.

First Implementation :

First I took one url at a time and gave it to the ONLY Controller object to process. Then it worked fine. But as it is sequential it took a long time when the list is too large.

Second Implementation:

Then in the Do_Work of the backgroundworker I made seven controllers and seven threads. Each controller has unique HTTPWorker object. But now it throws exceptions saying "timedout".

Below is the code in Form1.cs backgroundworker1_DoWork.

private void backgroundWorker1_DoWork(object sender, DoWorkEventArgs e)
{
    bool done = false;

    while (!backgroundWorker1.CancellationPending && !done)
    {
        int iterator = 1;
        int tempiterator = iterator;
        Controller[] cntrlrarray = new Controller[numofcontrollers];
        Thread[] threadarray = new Thread[numofcontrollers];
        int cntrlcntr = 0;
        for ( cntrlcntr = 0; cntrlcntr < numofcontrollers; cntrlcntr++)
        {
            cntrlrarray[cntrlcntr] = new Controller();

        }
        cntrlcntr = 0; 
        for (iterator = 1; iterator <= this.urlList.Count; iterator++)
        {
            int assignedthreads = 0;

            for (int threadcounter = 0; threadcounter < numofcontrollers; threadcounter++)
            {
                cntrlcntr = threadcounter;
                threadarray[threadcounter] = new Thread(() => cntrlrarray[cntrlcntr].Process(iterator - 1));
                threadarray[threadcounter].Name = this.urlList[iterator - 1];
                threadarray[threadcounter].Start();
                backgroundWorker1.ReportProgress(iterator);
                assignedthreads++;

                if (iterator == this.urlList.Count)
                {
                    break;
                }
                else
                {
                    iterator++;
                }

            }

            for (int threadcounter = 0; threadcounter < assignedthreads; threadcounter++)
            {
                cntrlcntr = threadcounter;
                threadarray[threadcounter].Join();

            }
            if (iterator == this.urlList.Count)
            {
                break;
            }
            else
            {
                iterator--;
            }

        }
        done = true;
    }
}

What is the reason and the solution for this? Appolgises for being too lengthy. Thank you in advance.

Alberto
  • 15,626
  • 9
  • 43
  • 56
  • Try it with fewer threads first. Most likely there is a limit (4 simultaneous connections) somewhere. Use `Parallel.ForEach()` wit a MaxDegree option instead of Thread objects. – H H Dec 10 '13 at 08:16

1 Answers1

5

The sky... it's full of Threads! Seriously, though - don't use this many threads. That's what asynchronous I/O is for. If you're using .NET 4.5, this is very easy to do using await/async, otherwise it's a bit of boilerplate code, but it's still far preferable to this.

With that out of the way, the amount of TCP connections is quite limited by default. Even if there was a use for having 1000 downloads at once (and it probably isn't, since you're sharing bandwidth), you simply can't create and drop TCP connections willy-nilly - there's a limit to open TCP connections (anywhere from 5 to 20, unless you're on a server). You can change this, but it's usually preferred to do things differently. See this entry. This might also be a problem if this application is not running alone (which it probably isn't, given that you wouldn't have such a problem on server Windows). For example, torrent clients often bump into the half-open connection limit (a connection which is still waiting for the end of the initial TCP handskahe). This would be detriminal to your application, of course).

Now, even if you keep under this limit, there's also a fixed amount of outbound and inbound ports to use when communicating. This is a problem when you quickly open and close TCP connections, because TCP keeps the connection alive in the background for about 4 minutes (to make sure no wrong packets arrive to the port, which could be reused in the meantime). This means that if you create enough connections in this time interval, you're going to "starve" your port pool, and every new TCP connection will be denied (so your browser will temporarily stop working, etc.).

Next, a 5 second timeout is pretty low. Really. Imagine that it would take a second to complete a handshake (that's a ping of ~300ms, which is still within the realm of reasonable internet response). Suddenly, you've got a new connection, which has to wait for the other handshakes to finish, and it might take a few seconds just for that. And that's still just the initiation of the connection. Then there's the DNS lookup, and the response of the HTTP server itself... 5 seconds is a low timeout.

In short, it's not the multi-threading - it's the massive amounts of (useless) connections you're opening. Also, for URLs on a single web, you should look into Keep-Alive connections - they can reuse the already opened TCP connection, which significantly mitigates this problem.

Now, to get deeper into this. You're starting and destroying threads needlessly. Instead, it would be a better idea to have a URL queue and several thread consumers, that would take input from the queue. This way, you'll only have those 7 (or whatever the number) threads that poll from the queue as long as there's something in it, which saves a lot of system resources (and improves your performance). I'm thinking that the Thread.Join you're doing might also have something to do with your issues. Even though you're running the thing in a background worker, it just might be possible there's something strange hapenning in there.

Luaan
  • 62,244
  • 7
  • 97
  • 116
  • Thank you very much! If you can please give me a example code for this using async and await. I am trying that also. I now understand why this doesn't work. I would like to do this using async and await. Because it is clean and clear. Can u provide an example please. – Hareendra Chamara Philips Dec 10 '13 at 09:27
  • @HareendraChamara Have a look at this [answer](http://stackoverflow.com/a/7475427/3032289). On first glance, it appears to be the correct way to do this. Or, use a ready solution - https://code.google.com/p/abot/ – Luaan Dec 10 '13 at 09:37