32

We're trying to measure the performance between reading a series of files using sync methods vs async. Was expecting to have about the same time between the two but turns out using async is about 5.5x slower.

This might be due to the overhead of managing the threads but just wanted to know your opinion. Maybe we're just measuring the timings wrong.

These are the methods being tested:

    static void ReadAllFile(string filename)
    {
        var content = File.ReadAllBytes(filename);
    }

    static async Task ReadAllFileAsync(string filename)
    {
        using (var file = File.OpenRead(filename))
        {
            using (var ms = new MemoryStream())
            {
                byte[] buff = new byte[file.Length];
                await file.ReadAsync(buff, 0, (int)file.Length);
            }
        }
    }

And this is the method that runs them and starts the stopwatch:

    static void Test(string name, Func<string, Task> gettask, int count)
    {
        Stopwatch sw = new Stopwatch();

        Task[] tasks = new Task[count];
        sw.Start();
        for (int i = 0; i < count; i++)
        {
            string filename = "file" + i + ".bin";
            tasks[i] = gettask(filename);
        }
        Task.WaitAll(tasks);
        sw.Stop();
        Console.WriteLine(name + " {0} ms", sw.ElapsedMilliseconds);

    }

Which is all run from here:

    static void Main(string[] args)
    {
        int count = 10000;

        for (int i = 0; i < count; i++)
        {
            Write("file" + i + ".bin");
        }

        Console.WriteLine("Testing read...!");            

        Test("Read Contents", (filename) => Task.Run(() => ReadAllFile(filename)), count);
        Test("Read Contents Async", (filename) => ReadAllFileAsync(filename), count);

        Console.ReadKey();
    }

And the helper write method:

    static void Write(string filename)
    {
        Data obj = new Data()
        {
            Header = "random string size here"
        };
        int size = 1024 * 20; // 1024 * 256;

        obj.Body = new byte[size];

        for (var i = 0; i < size; i++)
        {
            obj.Body[i] = (byte)(i % 256);
        }

        Stopwatch sw = new Stopwatch();
        sw.Start();

        MemoryStream ms = new MemoryStream();
        Serializer.Serialize(ms, obj);
        ms.Position = 0;

        using (var file = File.Create(filename))
        {
            ms.CopyToAsync(file).Wait();
        }

        sw.Stop();
        //Console.WriteLine("Writing file {0}", sw.ElapsedMilliseconds); 
    }

The results:

-Read Contents 574 ms
-Read Contents Async 3160 ms

Will really appreciate if anyone can shed some light on this as we searched the stack and the web but can't really find a proper explanation.

Stephen Kennedy
  • 20,585
  • 22
  • 95
  • 108
gcastelo
  • 393
  • 1
  • 3
  • 8
  • Your test might be flawed as you are spawning threads to do the reads simultaneously. A better test would be to test on thing, and then test the other. – Kami Aug 20 '13 at 09:37
  • 1
    On a different note, there's a nifty static method on Stopwatch called `StartNew`, it basically does `var s = new Stopwatch(); s.Start(); return s;` so you don't have to. :-) – Patrick Aug 20 '13 at 09:40
  • I think that test is flawed. Did you measure the difference between ReadAllBytes and Read? This could be a first thing, that ReadAllBytes is more efficient - perhaps it's an "atomic" operation? – TGlatzer Aug 20 '13 at 10:07
  • Are you testing this in release mode, without the debugger attached? Timing a debug build, or timing with the debugger attached will give unreliable results. Be sure to compile in Release mode and run with Ctrl+F5 (Run without debugging). – Jim Mischel Aug 20 '13 at 12:54
  • Why do you have this line: `using (var ms = new MemoryStream())`? It doesn't look like you use `ms` anywhere. –  Aug 20 '13 at 15:42

2 Answers2

46

There are lots of things wrong with the testing code. Most notably, your "async" test does not use async I/O; with file streams, you have to explicitly open them as asynchronous or else you're just doing synchronous operations on a background thread. Also, your file sizes are very small and can be easily cached.

I modified the test code to write out much larger files, to have comparable sync vs async code, and to make the async code asynchronous:

static void Main(string[] args)
{
    Write("0.bin");
    Write("1.bin");
    Write("2.bin");

    ReadAllFile("2.bin"); // warmup

    var sw = new Stopwatch();
    sw.Start();
    ReadAllFile("0.bin");
    ReadAllFile("1.bin");
    ReadAllFile("2.bin");
    sw.Stop();

    Console.WriteLine("Sync: " + sw.Elapsed);

    ReadAllFileAsync("2.bin").Wait(); // warmup

    sw.Restart();
    ReadAllFileAsync("0.bin").Wait();
    ReadAllFileAsync("1.bin").Wait();
    ReadAllFileAsync("2.bin").Wait();
    sw.Stop();

    Console.WriteLine("Async: " + sw.Elapsed);

    Console.ReadKey();
}

static void ReadAllFile(string filename)
{
    using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, false))
    {
        byte[] buff = new byte[file.Length];
        file.Read(buff, 0, (int)file.Length);
    }
}

static async Task ReadAllFileAsync(string filename)
{
    using (var file = new FileStream(filename, FileMode.Open, FileAccess.Read, FileShare.Read, 4096, true))
    {
        byte[] buff = new byte[file.Length];
        await file.ReadAsync(buff, 0, (int)file.Length);
    }
}

static void Write(string filename)
{
    int size = 1024 * 1024 * 256;
    var data = new byte[size];
    var random = new Random();
    random.NextBytes(data);
    File.WriteAllBytes(filename, data);
}

On my machine, this test (built in Release, run outside the debugger) yields these numbers:

Sync: 00:00:00.4461936
Async: 00:00:00.4429566
Stephen Cleary
  • 437,863
  • 77
  • 675
  • 810
  • 1
    Thanks for pointing this out. I did get the about the same results on my machine which makes a lot more sense now. This is actually a test for a file cache and we're prototyping the best way to read a bunch of small files. – gcastelo Aug 21 '13 at 10:48
  • Hi Stephen, if the file size is big, let's say several hundred Mbs. Would this code be broken?? byte[] buff = new byte[file.Length]; await file.ReadAsync(buff, 0, (int)file.Length); Because it tries to allocate a big chunk of memory ? – Toan Nguyen Apr 16 '15 at 10:59
  • @ToanNguyen: I'm not sure what you mean; if you have the memory for it, then it wouldn't be "broken". – Stephen Cleary Apr 16 '15 at 11:38
  • I meant it has a high possibility of "SystemOutOfMemoryException" if the system cannot allocate the required chunk ( file.Length) of memory? – Toan Nguyen Apr 16 '15 at 20:13
  • @ToanNguyen: Yes, if you allocate too much memory, then you will get an out of memory exception. – Stephen Cleary Apr 16 '15 at 21:09
  • 8
    This test is highly flawed - FileStream.Read and FileStream.ReadAsync don't necessarily read the number of bytes you requested, and in the case of larger files like you have, very likely aren't. You need to iterate the reads in a loop until total bytes read matches the size of the file. – Mike Marynowski Jan 29 '16 at 14:53
  • If I change the test-code above to not just read the file, but measure the difference between copyto & copytoasync, the synchronous code is 6x faster than the async code. – Frederik Gheysels Nov 13 '17 at 15:26
  • @FrederikGheysels: Sounds like you may be running into fake-async issues (e.g., `CopyToAsync` on a `MemoryStream`). – Stephen Cleary Nov 13 '17 at 18:40
  • @StephenCleary I just took your example and modified it to copy a filestream using CopyTo and CopytoAsync instead of your Read & ReadAsync – Frederik Gheysels Nov 13 '17 at 20:00
  • @FrederikGheysels ... and what stream are you copying to? – Stephen Cleary Nov 14 '17 at 05:08
  • @StephenCleary : I used FileStreams in that sample; copy from filestream a to filestream b. Since your earlier comment made me think of something, I've started a new SO question. https://stackoverflow.com/questions/47281063/should-i-use-async-i-o-on-memorystreams – Frederik Gheysels Nov 14 '17 at 08:39
  • 1
    Well, interesting thng ... try to read by streamreader it is not optimal solution I know but... string s = await sr.ReadToEndAsync(); async is slower . I have made MVC test and still have no idea why it is so. https://stackoverflow.com/questions/52456841/why-async-performance-in-mvc-worse-then-other-approaches – Serg Shevchenko Sep 29 '18 at 12:36
  • with regards to the other comments `read the number of bytes you requested`should it be change to follow this pattern? https://pastebin.com/68PrKYJg i dont know but asking. – Seabizkit Jan 16 '20 at 08:27
  • isn't this test flawed in that it should be measuring the amount of time there is a extra thread free'ed to do something else? i mean if its the same thread doing the work, then it would make sense it would take longer, as its wrapping Tasks. but at no point is it shown that during that x ns it could of been processing y. – Seabizkit Jan 16 '20 at 09:31
  • @Seabizkit: The test is measuring the total time it takes to write a file synchronously vs asynchronously. If you want to measure the amount of time an extra thread is freed, then that test would look different. – Stephen Cleary Jan 16 '20 at 14:12
  • @StephenCleary i think you miss it. The benefit of doing something asynchronously is about freeing the working thread to do something else. As your test does not test this, it not a fair comparison of "performance", obviously the asynchronously one will take more time, don't even need to test this, as it has to do more thread syncs. Would of been nice if there was able to show "available" compute time rather just total time taken(extremely hard). what im getting at is base off this i would not ever expect it to be faster only slower... but it could of been doing more work. – Seabizkit Jan 16 '20 at 15:08
  • @StephenCleary but am interested if pastebin.com/68PrKYJg is more correct? dont know on this one – Seabizkit Jan 16 '20 at 15:18
  • @Seabizkit: For general streams, yes. I believe file streams have special behavior where they always read the requested number of bytes unless the end of stream is encountered, but don't quote me on that. – Stephen Cleary Jan 16 '20 at 19:59
7

All I/O Operation are async. The thread just waits(it gets suspended) for I/O operation to finish. That's why when read jeffrey richter he always tells to do i/o async, so that your thread is not wasted by waiting around. from Jeffery Ricter

Also creating a thread is not cheap. Each thread gets 1 mb of address space reserved for user mode and another 12kb for kernel mode. After this the OS has to notify all the dll in system that a new thread has been spawned.Same happens when you destroy a thread. Also think about the complexities of context switching

Found a great SO answer here

Community
  • 1
  • 1
Anand
  • 14,545
  • 8
  • 32
  • 44