0

I am testing the validity of a large list of proxy servers concurrently. During this testing, many exceptions are being raised and caught. Although I am doing the testing in a background thread, my UI becomes unresponsive unless I use a SemaphoreSlim object to control the concurrency.

I know this is a self imposed bottle neck, and when scaling with an even larger list of proxies to test, I was hoping there might be a better way to solve the problem.

private void ValidateProxiesButton_Click(object sender, EventArgs e)
{
    new Thread(async () =>
    {
        Thread.CurrentThread.IsBackground = true;
        await ValidateProxiesAsync(proxies, judges, tests, 10);
    }).Start();
}


public async Task ValidateProxiesAsync(IEnumerable<Proxy> proxies, IEnumerable<ProxyJudge> judges, IEnumerable<ProxyTest> tests = null, int maxConcurrency = 20)
{
    if (proxies.Count() == 0)
    {
        throw new ArgumentException("Proxy list empty.");
    }

    foreach (var proxy in proxies)
    {
        proxy.Status = ProxyStatus.Queued;
    }

    //Get external IP to check if proxy is anonymous.
    var publicIp = await WebUtility.GetPublicIP();
    foreach (var judge in judges)
    {
        judge.Invalidation = publicIp;
    }

    await ValidateTestsAsync(judges.ToList<IProxyTest>());
    var validJudges = judges.ToList<IProxyTest>().GetValidTests();
    if (validJudges.Count == 0)
    {
        throw new ArgumentException("No valid judges found.");
    }

    if (tests != null)
    {
        await ValidateTestsAsync(tests.ToList<IProxyTest>());
    }

    var semaphore = new SemaphoreSlim(maxConcurrency);
    var tasks = new List<Task>();
    foreach (var proxy in proxies)
    {
        tasks.Add(Task.Run(async () =>
        {
            await semaphore.WaitAsync();
            proxy.Status = ProxyStatus.Testing;
            var isValid = await proxy.TestValidityAsync((IProxyTest)validJudges.GetRandomItem());
            proxy.Status = isValid ? ProxyStatus.Valid : ProxyStatus.Invalid;
            semaphore.Release();
        }));
    }

    await Task.WhenAll(tasks);
}

Inside proxy.TestValidityAsync method

public async Task<bool> TestValidityAsync(IProxyTest test, int timeoutSeconds = 30)
{
    try
    {
        var req = WebRequest.Create(test.URL);
        req.Proxy = new WebProxy(this.ToString());
        var respBody = await WebUtility.GetResponseStringAsync(req).TimeoutAfter(new TimeSpan(0, 0, timeoutSeconds));
        if (respBody.Contains(test.Validation))
        {
            return true;
        }
        else
        {
            return false;
        }
    }
    catch (Exception)
    {
        return false;
    }
}
JohnWick
  • 4,929
  • 9
  • 37
  • 74
  • MickyD I've tried it without the new thread as well, same result. Also do network requests count as CPU bound? "CPU bound means the program is bottlenecked by the CPU, or central processing unit, while I/O bound means the program is bottlenecked by I/O, or input/output, such as reading or writing to disk, network, etc." – JohnWick May 03 '18 at 00:28
  • @MickyD I'm not sure if the network request processing counts as CPU bound though (I think not), so not sure how it applies here?? – JohnWick May 03 '18 at 00:30
  • @MickyD Also this isn't the Code Review Stack Exchange. I'm not asking for a code review, but rather a solution to the actual problem :) – JohnWick May 03 '18 at 00:43
  • @DavidStampher The actual problem is that you don't need any explicit threads in this code at all as far as I can see. Make `ValidateProxiesButton_Click` async, remove all of the places you instantiate a new `Thread`, and see how that works. – Daniel Mann May 03 '18 at 01:52
  • _"Also this isn't the Code Review Stack Exchange"_ - no, but be prepared for friendly _"oh by the way...."_ suggestions to help you in future. It is disappointing you have not realised this –  May 03 '18 at 02:02
  • I can't see how using `Task` or spinning up an explicit thread that does not update the UI could freeze the UI anyway. –  May 03 '18 at 02:03
  • @DanielMann The new thread seems to be a red herring for people commenting here, removing it does not improve performance in any noticeable way. I actually had that button click handler without the new thread before I posted this, and the same problem existed. – JohnWick May 03 '18 at 02:15
  • "I can't see how using Task or spinning up an explicit thread that does not update the UI could freeze the UI anyway." Yeah...same here...which is why I asked the question. There are a ton of exceptions being raised/caught during testing, but I'm not sure how those are affecting the UI thread. – JohnWick May 03 '18 at 02:16
  • 1
    The only other thing I can think of is maybe other parts of your code are kicking of alot of short UI-captured `await`s. When they complete they would interrupt the UI to execute the line after the await. Too many can be a problem. _"[As asynchronous GUI applications grow larger, you might find many small parts of async methods all using the GUI thread as their context. This can cause sluggishness as responsiveness suffers from “thousands of paper cuts.”](https://msdn.microsoft.com/en-us/magazine/jj991977.aspx?f=255&MSPPError=-2147217396)"_ –  May 03 '18 at 04:49
  • @MickyD Ok thanks, will check the link out in a bit. Was thinking about trying to ping the proxies before actually testing them. This greatly reduces the amount of exceptions being thrown/caught (currently, it's a ton of exceptions in a short period of time), and only testing those which pass the ping test could be a way to solve this as well. – JohnWick May 03 '18 at 04:58
  • 1
    OK. Let us know how you go. Remember, if in the meantime you find the fault yourself you can post your own answer and even _accept_ it :) –  May 03 '18 at 05:03
  • 1
    I saw your another question (now deleted) where you start 100.000 tasks each of which throws exception. In case of that question - your CPU was just too busy handling that, and that reduces responsiveness of UI (because UI needs CPU too). You can verify that by looking at windows process explorer and see how high CPU consumption of your process is – Evk May 03 '18 at 08:31
  • @Evk Yeah thanks. I suppose the only way to get around that is to run them in batches. Will look into it tomorrow. – JohnWick May 03 '18 at 09:28
  • @MickyD Found a solution and posted it as an answer. Ended up getting the NuGet TPL.Dataflow package and using the TransformBlock class, which I hadn't even heard of until this morning. But after testing it, my UI stays responsive even when sending many concurrent requests which often throw exceptions (due to invalid proxies being tested). – JohnWick May 03 '18 at 20:13

1 Answers1

0

So I found a working solution, it is to add the TPL Dataflow NuGet package to my project and then use the TransformBlock class. When I do this, my UI stays very responsive even if I am processing tons of concurrent requests that often throw exceptions. The code below is proof of concept, I will update it when I translate it to work with my project.

Source: Throttling asynchronous tasks

private async void button1_Click(object sender, EventArgs e)
{

    var downloader = new TransformBlock<string, WebResponse>(
            url => Download(url),
            new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 200 }
        );

    var buffer = new BufferBlock<WebResponse>();
    downloader.LinkTo(buffer);

    var urls = new List<string>();
    for (int i = 0; i < 100000; i++)
    {
        urls.Add($"http://example{i}.com");
    }

    foreach (var url in urls)
        downloader.Post(url);
    //or await downloader.SendAsync(url);

    downloader.Complete();
    await downloader.Completion;

    IList<WebResponse> responses;
    if (buffer.TryReceiveAll(out responses))
    {
        //process responses        
    }
}

private WebResponse Download(string url)
{
    WebResponse resp = null;
    try
    {
        var req = WebRequest.Create(url);
        resp = req.GetResponse();
    }
    catch (Exception)
    {

    }
    return resp;
}
}
JohnWick
  • 4,929
  • 9
  • 37
  • 74