1

I have a class that creates multiple WebClient classes with different proxies on multiple threads simultaneously.

Unfortunately, some instances of WebClient class take quite long to finish. Usually, I end up with ~20 threads that take a few minutes to finish. On the other hand, I spawn hundreds of threads which finish fast.

I tried to create extend the WebClient class and set the Timeout property to 20 seconds (as posted here), but it didn't change anything.

I'm not showing the whole code, because there would be quite a lot of it (WebClient is wrapped in another class). Still, I know the bottle-neck is WebClient.DownloadString(url), because all of the worker threads are processing this specific line whenever I pause debugging during that last step of executing code.

Here's how I use the extended WebClient:

public string GetHtml(string url)
{
    this.CheckValidity(url);

    var html = "";

    using (var client = new WebDownload())
    {
        client.Proxy = this.Proxy;
        client.Headers[HttpRequestHeader.UserAgent] = this.UserAgent;
        client.Timeout = this.Timeout;

        html = client.DownloadString(url);
    }

    return html;
 }

EDIT

I have just ran a few tests, and some of the threads take up to 7 minutes to finish, all contemplating the WebClient.DownloadString() statement.

Furthermore, I have tried setting ServicePointManager.DefaultConnectionLimit to int.MaxValue, unfortunately to no avail.

Community
  • 1
  • 1
moskalak
  • 271
  • 2
  • 12
  • Have you tried this without threading to get a feel for the expected response time? – brumScouse Apr 29 '14 at 12:25
  • Are they all fetching from the same host by any chance? – Jon Skeet Apr 29 '14 at 12:27
  • @brumScouse, Yes I have. I believe that the problem might be quality of supplied proxy. It would be great if I could simply set timeout for the whole operation. I suppose I could do this by using `DownloadStringAsync`, waiting for it to finish and throwing an exception after supplied timeout (have just thought of that :D). But is that a good idea? – moskalak Apr 29 '14 at 12:29
  • @JonSkeet, spot on! They are, but each of the requests uses a different proxy, so that shouldn't be a problem. (I guess?). – moskalak Apr 29 '14 at 12:31
  • @moskalak: I don't know how the connection pool works in those terms, to be honest. How many different proxies are involved, vs how many requests? – Jon Skeet Apr 29 '14 at 12:36
  • @JonSkeet: While testing I didn't exceed 200 simultaneous requests, but even when I set max to 40, it ends the same. As for the proxies, there's around 1000 of them, but some are of a bit dubious quality. – moskalak Apr 29 '14 at 12:39
  • @JonSkeet, Thank you all for your help. I think I've found the solution, have checked it and it seems to work pretty well. I would appreciate any comments, though. – moskalak Apr 29 '14 at 13:42

1 Answers1

2

Here's what I ended up doing.

I realized that the problem was, I needed simply to cancel WebClient.DownloadString() when it reached the specified timeout. Since I haven't found anything that would help me in WebClient, I simply called WebClient.DownloadStringTaskAsync(). This way, I could use Task.WaitAll with timeout to wait for WebClient to finish downloading string and then check if the task has finished (to rule out timeout).

Here's the code:

public string GetHtml(string url)
{
    var html = "";

    using (var client = new WebClient())
    {
        // Assign all the important stuff
        client.Proxy = this.Proxy;
        client.Headers[HttpRequestHeader.UserAgent] = this.UserAgent;

        // Run DownloadString() as a task.
        var task = client.DownloadStringTaskAsync(url);

        // Wait for the task to finish, or timeout
        Task.WaitAll(new Task<string>[] { task }, this.Timeout);

        // If timeout was reached, cancel task and throw an exception.
        if (task.IsCompleted == false)
        {
            client.CancelAsync();
            throw new TimeoutException();
        }

        // Otherwise, happy. :)
        html = task.Result;
    }
moskalak
  • 271
  • 2
  • 12