1

I have 3 files, each 1 million rows long and I'm reading them line by line. No processing, just reading as I'm just trialling things out.

If I do this synchronously it takes 1 second. If I switch to using Threads, one for each file, it is slightly quicker (code not below, but I simply created a new Thread and started it for each file).

When I change to async, it is taking 40 times as long at 40 seconds. If I add in any work to do actual processing, I cannot see how I'd ever use async over synchronous or if I wanted a responsive application using Threads.

Or am I doing something fundamentally wrong with this code and not as async was intended?

Thanks.

class AsyncTestIOBound
{
    Stopwatch sw = new Stopwatch();
    internal void Tests()
    {
        DoSynchronous();
        DoASynchronous();
    }
    #region sync
    private void DoSynchronous()
    {
        sw.Restart();
        var start = sw.ElapsedMilliseconds;
        Console.WriteLine($"Starting Sync Test");
        DoSync("Addresses", "SampleLargeFile1.txt");
        DoSync("routes   ", "SampleLargeFile2.txt");
        DoSync("Equipment", "SampleLargeFile3.txt");
        sw.Stop();
        Console.WriteLine($"Ended Sync Test. Took {(sw.ElapsedMilliseconds - start)} mseconds");
        Console.ReadKey();
    }

    private long DoSync(string v, string filename)
    {
        string line;
        long counter = 0;
        using (StreamReader file = new StreamReader(filename))
        {
            while ((line = file.ReadLine()) != null)
            {
                counter++;
            }
        }
        Console.WriteLine($"{v}: T{Thread.CurrentThread.ManagedThreadId}: Lines: {counter}");
        return counter;
    }
    #endregion

    #region async
    private void DoASynchronous()
    {
        sw.Restart();
        var start = sw.ElapsedMilliseconds;
        Console.WriteLine($"Starting Sync Test");
        Task a=DoASync("Addresses", "SampleLargeFile1.txt");
        Task b=DoASync("routes   ", "SampleLargeFile2.txt");
        Task c=DoASync("Equipment", "SampleLargeFile3.txt");
        Task.WaitAll(a, b, c);
        sw.Stop();
        Console.WriteLine($"Ended Sync Test. Took {(sw.ElapsedMilliseconds - start)} mseconds");
        Console.ReadKey();
    }

    private async Task<long> DoASync(string v, string filename)
    {
        string line;
        long counter = 0;
        using (StreamReader file = new StreamReader(filename))
        {
            while ((line = await file.ReadLineAsync()) != null)
            {
                counter++;
            }
        }
        Console.WriteLine($"{v}: T{Thread.CurrentThread.ManagedThreadId}: Lines: {counter}");
        return counter;
    }
    #endregion

}
Neil Walker
  • 6,400
  • 14
  • 57
  • 86
  • Notice that even you doing `await file.ReadLineAsync` - you still accessing one source of data, which can be accessed only one by one. So in async approach you just adding `await` overhead. – Fabio Feb 18 '19 at 18:46
  • This code is not accessing the file asynchronously. The file APIs are a bit odd; [you *must* pass `true` for `isAsync` or `FileOptions.Asynchronous`](https://blog.stephencleary.com/2010/08/reminder-about-asynchronous-filestreams.html). Otherwise (i.e., in this code), "asynchronous" code like `ReadLineAsync` is actually just doing synchronous work on a thread pool thread. That said, concurrently accessing a limited resource (HDD) is probably going to *hurt* performance, as the other answers point out; so even if this was truly asynchronous, it wouldn't be *faster*. – Stephen Cleary Feb 19 '19 at 15:38

2 Answers2

3

a few things. First I would read all lines at once in the async method so that you are only awaiting once (instead of per line).

private async Task<long> DoASync(string v, string filename)
{
    string lines;
    long counter = 0;
    using (StreamReader file = new StreamReader(filename))
    {
        lines = await reader.ReadToEndAsync();
    }
    Console.WriteLine($"{v}: T{Thread.CurrentThread.ManagedThreadId}: Lines: {lines.Split('\n').Length}");
    return counter;
}

next, you can also wait for each Task individually. This will cause your CPU to only focus on one at a time, instead of possibly switching between the 3, which will cause more overhead.

private async void DoASynchronous()
{
    sw.Restart();
    var start = sw.ElapsedMilliseconds;
    Console.WriteLine($"Starting Sync Test");
    await DoASync("Addresses", "SampleLargeFile1.txt");
    await DoASync("routes   ", "SampleLargeFile2.txt");
    await DoASync("Equipment", "SampleLargeFile3.txt");
    sw.Stop();
    Console.WriteLine($"Ended Sync Test. Took {(sw.ElapsedMilliseconds - start)} mseconds");
    Console.ReadKey();
}

The reason why you are seeing slower performance is due to how await works with the CPU load. For each new line, this will cause an increase of CPU usage. Async machinery adds processing, allocations and synchronization. Also, we need to transition to kernel mode two times instead of once (first to initiate the IO, then to dequeue the IO completion notification).

More info, see: Does async await increases Context switching

d.moncada
  • 16,900
  • 5
  • 53
  • 82
  • In case where you are reading whole file asynchronously, you should gain in performance if you will not await for every task. With `await Task.WhenAll` there are only three asynchronous operations. – Fabio Feb 18 '19 at 20:05
  • thanks, I'll try that. No, I'm not specifically counting lines, this was more my placeholder for doing real work (one a line by line basis). I just presumed with really, really big files there might be memory issues reading it all in one go, plus I'd have to then split or read that in memory string into lines. – Neil Walker Feb 19 '19 at 10:07
3

Since you are using await several times in a giant loop (in your case, looping through each line of a "SampleLargeFile"), you are doing a lot of context switching, and the overhead can be really bad.

For each line, your code maybe is switching between each file. If your computer uses a hard drive, this can get even worse. Imagine the head of your HD getting crazy.

When you use normal threads, you are not switching the context for each line.

To solve this, just read the file on a single run. You can still use async/await (ReadToEndAsync()) and get a good performance.

EDIT

So, you are trying to count lines on the text file using async, right?

Try this (no need to load the entire file in memory):

private async Task<int> CountLines(string path)
{
    int count = 0;
    await Task.Run(() =>
    {
        using (FileStream fs = File.Open(path, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
        using (BufferedStream bs = new BufferedStream(fs))
        using (StreamReader sr = new StreamReader(bs))
        {
            while (sr.ReadLine() != null)
            {
                count++;
            }
        }
    });
    return count;
}
Guilherme
  • 5,143
  • 5
  • 39
  • 60
  • 1
    Yes. Wait for each task before starting the next one. – David Browne - Microsoft Feb 18 '19 at 18:45
  • thanks, I'll try that. No, I'm not specifically counting lines, this was more my placeholder for doing real work (one a line by line basis). I just presumed with really, really big files there might be memory issues reading it all in one go, plus I'd have to then split or read that in memory string into lines. – Neil Walker Feb 19 '19 at 10:07