-1

I am experimenting / learning the new Task library and I have written a very simple html downloader using WebClient and Task.Run. However I can never reach anything more than 5% on my network usage. I would like to understand why and how I can improve my code to reach 100% network usage / throughput (probably not possible but it has to be a lot more than 5%).

I would also like to be able to limit the number of thread however it seems it's not as easy as I thought (i.e. custom task scheduler). Is there a way to just do something like this to set the max thread count: something.SetMaxThread(2)?

internal static class Program
    {
        private static void Main()
        {
            for (var i = 0; i < 1000000; i++)
            {
                Go(i, Thread.CurrentThread.ManagedThreadId);
            }

            Console.Read();
        }

        private static readonly Action<int, int> Go = (counter, threadId) => Task.Run(() =>
        {
            var stopwatch = new Stopwatch();
            stopwatch.Start();

            var webClient = new WebClient();
            webClient.DownloadString(new Uri("http://stackoverflow.com"));

            stopwatch.Stop();

            Console.Write("{0} == {1} | ", threadId.ToString("D3"), Thread.CurrentThread.ManagedThreadId.ToString("D3"));
            Console.WriteLine("{0}: {1}ms ", counter.ToString("D3"), stopwatch.ElapsedMilliseconds.ToString("D4"));
        });
    }

This is the async version according to @spender. However my understanding is that await will "remember" the point in time and hand off the download to OS level and skip (the 2 console.write) and return to main immediately and continue scheduling the remaining Go method in the for loop. Am I understanding it correctly? So there's no blocking on the UI.

private static async void Go(int counter, int threadId)
{
    using (var webClient = new WebClient())
    {
        var stopWatch = new Stopwatch();
        stopWatch.Start();

        await webClient.DownloadStringTaskAsync(new Uri("http://ftp.iinet.net.au/test500MB.dat"));

        stopWatch.Stop();

        Console.Write("{0} == {1} | ", threadId.ToString("D3"), Thread.CurrentThread.ManagedThreadId.ToString("D3"));
        Console.WriteLine("{0}: {1}ms ", counter.ToString("D3"), stopWatch.ElapsedMilliseconds.ToString("D4"));
    }
}

What I noticed was that when I am downloading large files there's no that much difference in terms of download speed / network usage. They (threading version and the async version) both peaked at about 12.5% network usage and about 12MByte download /sec. I also tried to run multiple instances (multiple .exe running) and again there's no huge difference between the two. And when I am trying to download large files from 2 URLs concurrently (20 instances) I get similar network usage (12.5%) and download speed (10-12MByte /sec). I guess I am reaching the peak?

Jeff
  • 13,079
  • 23
  • 71
  • 102
  • 3
    You’re limited by the speed of your internet connection (which is different from the capability of your cable/wireless card) and the speed of the remote server. Hitting 100% is pretty unlikely. – Ry- Jun 20 '13 at 01:17
  • 2
    In addition to what minitech says: push more data. The Stack Overflow homepage is relatively small; latency, and connection spinup, is tweaking your numbers. – Michael Petrotta Jun 20 '13 at 01:20
  • 1
    Why are you passing the ThreadId to your method? There's only a single thread at work here meaning that your downloads are occurring in serial. Consider moving to async code and parallelizing your downloads. – spender Jun 20 '13 at 01:21
  • @minitech I am testing this at work currently I am getting 14ms ping, 90.26Mbps download and 78.89Mbps upload and the LAN is wired. But I understand what you mean by "speed of remote server" I think I will try to hit multiple random websites to test again. And I understand hitting 100% is not possible. – Jeff Jun 20 '13 at 01:23
  • @MichaelPetrotta perhaps I should test again my local website which servers fat html page. – Jeff Jun 20 '13 at 01:24
  • Why not also simply declare a method with the signature `private static Task Go(int counter, int threadId)`, instead of assigning a delegate to a static variable? – spender Jun 20 '13 at 01:25
  • @spender the thread id thing is purely for myself to see that the WebClient is running on a different thread than my GUI / main thread. – Jeff Jun 20 '13 at 01:25
  • But how would that happen? There's no threads other than the main Thread. This looks to me like a console app with no spawning of additional threads. – spender Jun 20 '13 at 01:27
  • 2
    Go big, Jeff. Forget HTML for the moment - download a great big binary. Start with 5MB, say. You should see a real number there. Then look into better measurements - not starting your stopwatch until the first byte comes back, for instance. – Michael Petrotta Jun 20 '13 at 01:27
  • @MichaelPetrotta changing my code to download Linux iso. Now I am hitting 12.5% usage. – Jeff Jun 20 '13 at 01:30
  • And how does that compare to downloading the ISO with a browser? – Michael Petrotta Jun 20 '13 at 01:39
  • @MichaelPetrotta tried downloading 20 500mb files from 2 servers peaking 12.5% about 10Mb / sec – Jeff Jun 20 '13 at 01:52
  • 1
    How are you measuring your download speed, and is it Mbps or MBps? Because 10-12MBps is the fastest realistic speed over 100BaseT. – Stephen Cleary Jun 20 '13 at 03:16
  • @StephenCleary I previously mixed up with byte and bit with the results I was getting compared to adsl download test. Yes I am reaching the limit of 100BaseT now! – Jeff Jun 20 '13 at 03:35

1 Answers1

5

As it stands, your code is suboptimal because, although you are using Task.Run to create asynchronous code that runs in the ThreadPool, the code that is being run in the ThreadPool is still blocking on the line:

webClient.DownloadString(...

This amounts to an abuse of the ThreadPool because it is not designed to run blocking tasks, and is slow to spin up additional threads to deal with peaks in workload. This in turn will have a seriously degrading effect on the smooth running of any API that uses the ThreadPool (timers, async callbacks, they're everywhere), because they'll schedule work that goes to the back of the (saturated) queue for the ThreadPool (which is tied up reluctantly and slowly spinning up hundreds of threads that are going to spend 99.9% of their time doing nothing).

Stop blocking the ThreadPool and switch to proper async methods that do not block.

So now you can literally break your router and seriously upset the SO site admins with the following simple mod:

   private static void Main()
    {
        for (var i = 0; i < 1000000; i++)
        {
            Go(i, Thread.CurrentThread.ManagedThreadId);
        }

        Console.Read();
    }

    private static async Task Go(int counter, int threadId)
    {
        var stopwatch = new Stopwatch();
        stopwatch.Start();

        using (var webClient = new WebClient())
        {
            await webClient.DownloadStringTaskAsync(
                             new Uri("http://stackoverflow.com"));
        }
            //...
    }

HttpWebRequest (and therefore WebClient) are also constrained by a number of limits.

Community
  • 1
  • 1
spender
  • 117,338
  • 33
  • 229
  • 351
  • I don't understand the difference between spinning up thread (i.e. threadpool) versus "proper async method". Wouldn't "proper async method" spins up new thread somewhere in the "background" or without me being explicitly coding it? Unless you are indicating the difference between .NET runtime vs Windows system level thingy which I have no idea. – Jeff Jun 20 '13 at 01:41
  • 1
    No. The idea of true async is that you hand off the IO to the operating system and the operating system notifies you when the IO completes. Because your code never blocks waiting for IO, the handling of the IO result can be processed extremely quickly by a small number of ThreadPool threads. I've handled a few thousand concurrent IO operations with a ThreadPool thread count that rarely exceeds the number of cores on my machines. – spender Jun 20 '13 at 01:45
  • this is where I don't understand. you are saying that "DownloadStringTaskAsync" will hand off the "wait / downloading / delay / whatever the term is" to the OS level which is very cheap / light / fast so in my .NET / C# code we can then keep the no of .NET thread as low as possible? – Jeff Jun 20 '13 at 01:50
  • 2
    You got it. You let the OS wait for all the IO (it's good at doing this) whilst your program has to do almost nothing until the callback from the OS (which gets executed in the ThreadPool). If you keep the callback code (which would occur on the line after the `await` under async/await) minimal, then the time to process is minimal, and the thread is free to be reused for the next callback. It all happens so quickly that your program does next to no work, and you can get blistering performance. ...but you won't manage that kind of performance using a domestic grade router. ;) – spender Jun 20 '13 at 01:54
  • so we are basically talking about the way node.js does things here minus the threadpool thingy (i.e. concurrently handling call back from OS). I will rewrite and test again. Thanks. – Jeff Jun 20 '13 at 01:59