83

I have the following code,

    private void button1_Click(object sender, RoutedEventArgs e)
    {
        button1.IsEnabled = false;

        var s = File.ReadAllLines("Words.txt").ToList(); // my WPF app hangs here
        // do something with s

        button1.IsEnabled = true;
    }

Words.txt has a ton of words which i read into the s variable, I am trying to make use of async and await keywords in C# 5 using Async CTP Library so the WPF app doesn't hang. So far I have the following code,

    private async void button1_Click(object sender, RoutedEventArgs e)
    {
        button1.IsEnabled = false;

        Task<string[]> ws = Task.Factory.FromAsync<string[]>(
            // What do i have here? there are so many overloads
            ); // is this the right way to do?

        var s = await File.ReadAllLines("Words.txt").ToList();  // what more do i do here apart from having the await keyword?
        // do something with s

        button1.IsEnabled = true;
    }

The goal is to read the file in async rather than sync, to avoid freezing of WPF app.

Any help is appreciated, Thanks!

khellang
  • 17,550
  • 6
  • 64
  • 84
  • 1
    What about starting by removing the unnecessary call to ToList() which will make a copy of the string array? – Jb Evain Oct 31 '12 at 21:57
  • 3
    @JbEvain - To be pedantic, `ToList()` doesn't just copy the array, it creates a `List`. Without further information you can't assume its unnecessary, since perhaps "`// do something with s`" calls `List` methods. – Mike Oct 31 '12 at 22:10

5 Answers5

154

UPDATE: Async versions of File.ReadAll[Lines|Bytes|Text], File.AppendAll[Lines|Text] and File.WriteAll[Lines|Bytes|Text] have now been merged into .NET Core and shipped with .NET Core 2.0. They are also included in .NET Standard 2.1.

Using Task.Run, which essentially is a wrapper for Task.Factory.StartNew, for asynchronous wrappers is a code smell.

If you don't want to waste a CPU thread by using a blocking function, you should await a truly asynchronous IO method, StreamReader.ReadToEndAsync, like this:

using (var reader = File.OpenText("Words.txt"))
{
    var fileText = await reader.ReadToEndAsync();
    // Do something with fileText...
}

This will get the whole file as a string instead of a List<string>. If you need lines instead, you could easily split the string afterwards, like this:

using (var reader = File.OpenText("Words.txt"))
{
    var fileText = await reader.ReadToEndAsync();
    return fileText.Split(new[] { Environment.NewLine }, StringSplitOptions.None);
}

EDIT: Here are some methods to achieve the same code as File.ReadAllLines, but in a truly asynchronous manner. The code is based on the implementation of File.ReadAllLines itself:

using System.Collections.Generic;
using System.IO;
using System.Text;
using System.Threading.Tasks;

public static class FileEx
{
    /// <summary>
    /// This is the same default buffer size as
    /// <see cref="StreamReader"/> and <see cref="FileStream"/>.
    /// </summary>
    private const int DefaultBufferSize = 4096;

    /// <summary>
    /// Indicates that
    /// 1. The file is to be used for asynchronous reading.
    /// 2. The file is to be accessed sequentially from beginning to end.
    /// </summary>
    private const FileOptions DefaultOptions = FileOptions.Asynchronous | FileOptions.SequentialScan;

    public static Task<string[]> ReadAllLinesAsync(string path)
    {
        return ReadAllLinesAsync(path, Encoding.UTF8);
    }

    public static async Task<string[]> ReadAllLinesAsync(string path, Encoding encoding)
    {
        var lines = new List<string>();

        // Open the FileStream with the same FileMode, FileAccess
        // and FileShare as a call to File.OpenText would've done.
        using (var stream = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, DefaultBufferSize, DefaultOptions))
        using (var reader = new StreamReader(stream, encoding))
        {
            string line;
            while ((line = await reader.ReadLineAsync()) != null)
            {
                lines.Add(line);
            }
        }

        return lines.ToArray();
    }
}
khellang
  • 17,550
  • 6
  • 64
  • 84
  • 11
    This importantly uses Windows I/O ports to await this without ANY CPU threads, while the Task.Factory.StartNew/Task.Run approach in another answer wastes a CPU thread. This answer's approach is more efficient. – Chris Moschini Jun 07 '14 at 23:52
  • 1
    FYI; I've proposed async versions of these APIs over at https://github.com/dotnet/corefx/issues/11220. Let's see how it goes :) – khellang Aug 29 '16 at 10:06
  • I'd return List, the caller can decide whether it really needs array and avoid the extra allocation caused by `lines.ToArray`. – Steves Sep 25 '16 at 10:25
  • Second answer if very inefficient. Parsing large logs with milions of lines I've seen up to 16x longer times than simple File.ReadAllLines. – ghord Feb 05 '17 at 10:48
  • Second answer? Where's your data? – khellang Feb 05 '17 at 11:15
  • I really like this answer, so I put together a [test](https://pastebin.com/KQrGWiSr) for what @ghord said. The [results](https://pastebin.com/zHTumdZ5) show the async version is consistantly slower by 2x-20x on my machine. Running in VS under the debugger it's roughly 10x-20x slower, running it standalone it's roughly 2x-5x. I assume this is due to GC pressure from the extra Task object, but that's a guess. I still love this answer though! Async isn't supposed to make an individual operation faster, it's supposed to prevent blocking everything else running at the same time. –  Jul 07 '17 at 13:04
  • 2
    Warning: File.OpenText does not open the file in asynchronous mode. This causes ReadXxxxAsync functions to follow a different code path that does not use IO Completion, and is _much_ less efficient. I don't know why Microsoft elected to make this API so difficult to use correctly. – Mark Nov 08 '17 at 09:43
  • @Mark That's why my last code sample (and the final implementation in .NET Core) doesn't use `File.OpenText`, but rather create a `FileStream` and `StreamReader` with `FileOptions.Asynchronous` explicitly set. – khellang Nov 08 '17 at 09:55
  • This is really good, but I'm curious as to why the `List` is converted to an array at the end? Could it not just return `IEnumerable` or even the original list? Is there a reason? – Dan Diplo Jun 25 '18 at 09:19
  • The reason is to match the return type of the synchronous counterpart, `File.ReadAllLines` – khellang Jun 25 '18 at 10:03
  • 1
    Has anyone succeeded in backporting these to .NET Framework 4.7.2? – Sören Kuklau Feb 20 '20 at 10:40
  • @SörenKuklau There's no active work being done on .NET Framework. If you need the code, you need to copy it from here or the .NET Core source :) – khellang Feb 24 '20 at 10:54
0

Here are the helper methods I've created for a NetStandart 2.0 class library, that was used both in NetCore 3.1 and NetFramework 4.7.2 projects.

These implementations have matched exactly the names and signatures of the net core 3.1 / net standard 2.1 File class methods, so you only need to put them in any public class. (FileHelper for example...):

Also, this should be most efficient and similar to the source code of .net implementation.

    private const int DefaultBufferSize = 4096;
    // File accessed asynchronous reading and sequentially from beginning to end.
    private const FileOptions DefaultOptions = FileOptions.Asynchronous | FileOptions.SequentialScan;

    public static async Task WriteAllTextAsync(string filePath, string text)
    {
        byte[] encodedText = Encoding.Unicode.GetBytes(text);

        using FileStream sourceStream = new FileStream(filePath, FileMode.Append, FileAccess.Write, FileShare.None,
            DefaultBufferSize, true);
        await sourceStream.WriteAsync(encodedText, 0, encodedText.Length);
    }

    public static async Task<IEnumerable<string>> ReadAllLinesAsync(string filePath)
    {
        var lines = new List<string>();

        using var sourceStream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read,
            DefaultBufferSize, DefaultOptions);
        using var reader = new StreamReader(sourceStream, Encoding.Unicode);
        string line;
        while ((line = await reader.ReadLineAsync()) != null) lines.Add(line);

        return lines;
    }

    public static async Task<string> ReadAllTextAsync(string filePath)
    {
        using var sourceStream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read,
            DefaultBufferSize, DefaultOptions);
        using var reader = new StreamReader(sourceStream, Encoding.Unicode);
        return await reader.ReadToEndAsync();
    }

Edit

Apparently that the StreamReader "async" methods block the current thread for a considerable amount of time before returning an incomplete Task.

(Even the netcore 3.1 File.ReadAllLinesAsyn,File.ReadAllTextAsync currently aren't seems to be fully async. as you can check in source code, they are based on the StreamReader "async" methods).

So, I'm sharing an implementation that seems like the most efficient way currently. \

It's better than options like to run the sync methods in Task.Run(()=>File.ReadAllLines(...)), since its a very bad practice to wrap your sync code with Task.Run and expect this to be full async flow. \ Actually, it breaks the internal queues mechanism of the real asynchronous dotnet structure.

public static async Task<string> ReadAllTextAsync(string filePath)
    {
        using (var sourceStream = new FileStream(filePath, FileMode.Open, FileAccess.Read, FileShare.Read,
            DefaultBufferSize, DefaultOptions))
        {
            var sb = new StringBuilder();
            var buffer = new byte[0x1000];
            var numRead = 0;

            while ((numRead = await sourceStream.ReadAsync(buffer, 0, buffer.Length)) != 0)
                sb.Append(Encoding.Unicode.GetString(buffer, 0, numRead));
            return sb.ToString();
        }
    }

Testing Time

Here is my test and its output that displays clearly that the actual run is async:

        var stopwatch = Stopwatch.StartNew();
        var fileTask = FileHelper.ReadAllTextAsync("48MB_file.txt");
        var duration1 = stopwatch.ElapsedMilliseconds;
        var isCompleted = fileTask.IsCompleted;

        stopwatch.Restart();
        await fileTask;
        var duration2 = stopwatch.ElapsedMilliseconds;

        Console.WriteLine($"Creation took: {duration1:#,0} ms, Task.IsCompleted: {isCompleted}");
        Console.WriteLine($"Calling await took:  {duration2:#,0} ms, Task.IsCompleted: {fileTask.IsCompleted}");

Creation took: 43 ms, Task.IsCompleted: False
Calling await took: 508 ms, Task.IsCompleted: True

You can find more in the comments, and in this question: File.ReadAllLinesAsync() blocks the UI thread

Ester Kaufman
  • 708
  • 10
  • 20
  • Have you verified that the `WriteAllTextAsync`, `ReadAllLinesAsync` and `ReadAllTextAsync` methods are actually asynchronous, and they don't block the caller just like the synchronous APIs `File.WriteAllText`, `File.ReadAllLines`, `File.ReadAllText`? You can verify it by storing the created `Task` in a variable, and inspecting its `IsCompleted` property before awaiting it. If the `IsCompleted` is `true`, then for all intents and purposes the call was 100% synchronous. – Theodor Zoulias Sep 13 '21 at 11:52
  • @TheodorZoulias I'm not sure this is a correct test,async methods can complete synchronously if they choose, either because: they know the answer already (cached results, buffered data...) or they don't have a suitable asynchronous downstream operation to perform at all. need to think of a better way to check if not blocking the main thread. – Ester Kaufman Sep 14 '21 at 09:04
  • anyway, ill post another implementation based on MSDN examples, and that the test with IsCompleted did confirm it, as ur suggestion, so thanks! – Ester Kaufman Sep 14 '21 at 09:05
  • You might want to check out this question: [Why File.ReadAllLinesAsync() blocks the UI thread?](https://stackoverflow.com/questions/63217657/why-file-readalllinesasync-blocks-the-ui-thread) Things with async filesystem APIs are murkier than you think. Microsoft is going to release an [improved implementation](https://devblogs.microsoft.com/dotnet/announcing-net-6-preview-4/#significantly-improved-filestream-performance-on-windows) of the `FileStream` with the next .NET release, which I've not checked yet. – Theodor Zoulias Sep 14 '21 at 09:39
  • Actually, I just figure it myself, while testing the source code in net core 3.1, I saw that it does await the file reading even when I didn't await it, also when using ConfigureAwait(false) on the reader.ReadAsync method. I also saw ur comment here: https://www.py4u.net/discuss/707047. but don't think about the Task.Run is a better option for a couple of reasons. pls see my EDIT in my answer. – Ester Kaufman Sep 14 '21 at 10:05
  • Have you tested that the last `ReadAllTextAsync` (in the [2nd revision](https://stackoverflow.com/revisions/69161595/2) of this answer) is actually asynchronous, and also that it's not many times slower than the `Task.Run(File.ReadAllText)`? In other words, can people use the last `ReadAllTextAsync` in a GUI application, and be sure that the UI will remain responsive? – Theodor Zoulias Sep 14 '21 at 11:09
  • sure, u can test urself. and despite that it is a little slower (not many times, maybe 2x) its a very bad practice to wrap ur sync code with `Task.Run` and expect this to be fully async. it breaks the internal queues mechanism of the real asynchronous dotnet structure. – Ester Kaufman Sep 14 '21 at 11:21
  • My observations: Both `ReadAllTextAsync` versions in the [4th revision](https://stackoverflow.com/revisions/69161595/4) of your answer read ANSI files incorrectly (both assume that the encoding is unicode). Both have comparable performance with the native `File.ReadAllTextAsync`, when running on a Console app. Both are at least 2 times slower than the `Task.Run(() => File.ReadAllText(path))`. The first version blocks the current thread continuously for most of the total duration, while the second version blocks *some* threads intermittently, and accumulatively for most of the total duration. – Theodor Zoulias Sep 14 '21 at 14:26
  • You can experiment with [this](https://dotnetfiddle.net/Wd6pbl) version of your `ReadAllTextAsync` method, enhanced with precise measurements, to see the ugly truth. Be aware that in a WinForms/WPF application, all internal `await`s of `sourceStream.ReadAsync` tasks that are incomplete at the `await` point, will cause a thread-switch between the `ThreadPool` and the UI thread. And all `sourceStream.ReadAsync` invocations will run on the UI thread. I didn't test it, but my expectation is that for large files the `ReadAllTextAsync` will be **very** slow, and it might cause observable UI lag. – Theodor Zoulias Sep 14 '21 at 14:50
-1
private async Task<string> readFile(string sourceFilePath)
    {
        using (var fileStream = new FileStream(sourceFilePath, FileMode.Open, FileAccess.ReadWrite, FileShare.ReadWrite))
        {
            using (var streamReader = new StreamReader(fileStream))
            {
                string data = await streamReader.ReadToEndAsync().ConfigureAwait(false);
                streamReader.Close();
                fileStream.Close();
                return data;
            }

        }
    }
Mak Ahmed
  • 578
  • 5
  • 16
-3

Try this:

private async void button1_Click(object sender, RoutedEventArgs e)
{
    button1.IsEnabled = false;
    try
    {
        var s = await Task.Run(() => File.ReadAllLines("Words.txt").ToList());
        // do something with s
    }
    finally
    {
        button1.IsEnabled = true;
    }
}

Edit:

You don't need the try-finally for this to work. It's really only the one line that you need to change. To explain how it works: This spawns another thread (actually gets one from the thread pool) and gets that thread to read the file. When the file is finished reading then the remainder of the button1_Click method is called (from the GUI thread) with the result. Note that this is probably not the most efficient solution, but it is probably the simplest change to your code which doesn't block the the GUI.

Mike
  • 954
  • 6
  • 10
  • Worked like a charm!!! Thanks Mike, i was able to apply `Task.Factory.StartNew(() => 'Some Task')` to other Tasks too, Thanks again :) –  Oct 31 '12 at 22:13
  • 12
    While this certainly is the easiest solution and it will be most likely good enough for a simple GUI application, it doesn't use `async` to its full potential, because it still blocks a thread. – svick Nov 01 '12 at 07:45
  • @svick Thanks I've edited my answer to say `Task.Run()` instead of `Task.Factory.StartNew`. I agree completely about the thread blocking. Whether that's an issue or not depends on the situation (as you say). Certainly for reading a few files by the GUI I think the overhead is negligible. – Mike Nov 01 '12 at 15:28
  • 1
    This unblocks the UI thread, but blocks another thread pool thread for the duration of the read. – Mark Nov 08 '17 at 09:51
-4

I also encountered a problem described in your question. I've solved it just simplier that in previous answers:

string[] values;
StorageFolder folder = ApplicationData.Current.LocalFolder; // Put your location here.
IList<string> lines = await FileIO.ReadLinesAsync(await folder.GetFileAsync("Words.txt"););
lines.CopyTo(values, 0);
Anton Bakulev
  • 286
  • 2
  • 7
  • 2
    Where do the classes `ApplicationData` and `FileIO` come from? They don't seem to be a part of the .Net Framework. `ApplicationData` appears to be from the [UWP Framework](https://learn.microsoft.com/en-us/uwp/api/windows.storage.applicationdata). That means you can't use `ApplicationData` in a "normal" .net application. `FileIO` exists in the [VisualBasic assembly](https://msdn.microsoft.com/en-us/library/microsoft.visualbasic.fileio.filesystem(v=vs.110).aspx), but doesn't have async methods as far as I can see, so where are you getting it from? –  Jul 07 '17 at 11:19
  • @AndyJ, yes my solution is for UWP Applications. – Anton Bakulev Jul 10 '17 at 14:32