9

I have a list of float to write to a file. The code below does the thing but it is synchronous.

List<float> samples = GetSamples();

using (FileStream stream = File.OpenWrite("somefile.bin"))
using (BinaryWriter binaryWriter = new BinaryWriter(stream, Encoding.Default, true))
{
    foreach (var sample in samples)
    {
        binaryWriter.Write(sample);
    }
}

I want to do the operation asynchronously but the BinaryWriter does not support async operations, which is normal since it just only writes a few bytes each time. But most of the time the operation uses file I/O and I think it can and should be asynchronous.

I tried to write to a MemoryStream with the BinaryWriter and when that finished I copied the MemoryStream to the FileStream with CopyToAsync, however this caused a performance degradation (total time) up to 100% with big files.

How can I convert the whole operation to asynchronous?

Yusuf Tarık Günaydın
  • 3,016
  • 2
  • 27
  • 41

3 Answers3

6

Normal write operations usually end up being completed asynchronously anyway. The OS accepts writes immediately into the write cache, and flushes it to disk at some later time. Your application isn't blocked by the actual disk writes.

Of course, if you are writing to a removable drive then write cache is typically disabled and your program will be blocked.


I will recommend that you can dramatically reduce the number of operations by transferring a large block at a time. To wit:

  1. Allocate a new T[BlockSize] of your desired block size.
  2. Allocate a new byte[BlockSize * sizeof (T)]
  3. Use List<T>.CopyTo(index, buffer, 0, buffer.Length) to copy a batch out of the list.
  4. Use Buffer.BlockCopy to get the data into the byte[].
  5. Write the byte[] to your stream in a single operation.
  6. Repeat 3-5 until you reach the end of the list. Careful about the final batch, which may be a partial block.
Ben Voigt
  • 277,958
  • 43
  • 419
  • 720
  • Yes, this explains why the OP's buffered approach didn't work - it destroys all the benefit of having I/O buffering by pushing everything at once after the CPU work is done. There's still a lot you can get from a asynchronous I/O even when you're limited by the CPU, though - for example, maintaining thread affinity. – Luaan Feb 15 '16 at 17:16
  • Then, should I call this function with `Task.Run` to call it asynchronously? – Yusuf Tarık Günaydın Feb 15 '16 at 19:40
  • Hi Ben, quick question, will this caching behavior also apply to a BinaryWriter that is attached to a Network stream? – BlueStrat Mar 13 '19 at 18:46
  • 1
    @BlueStrat: You've even more likely to see caching / hidden asynchronicity on a slow I/O device like a network share. – Ben Voigt Mar 14 '19 at 02:42
1

Your memory stream approach makes sense, just make sure to write in batches rather than waiting for the memory stream to grow to the full size of the file and then writing it all at once.

Something like this should work fine:

var data = new float[10 * 1024];
var helperBuffer = new byte[4096];

using (var fs = File.Create(@"D:\Temp.bin"))
using (var ms = new MemoryStream(4096))
using (var bw = new BinaryWriter(ms))
{
  var iteration = 0;

  foreach (var sample in data)
  {
    bw.Write(sample);

    iteration++;

    if (iteration == 1024)
    {
      iteration = 0;
      ms.Position = 0;

      ms.Read(helperBuffer, 0, 1024 * 4);
      await fs.WriteAsync(helperBuffer, 0, 1024 * 4).ConfigureAwait(false);
    }
  }
}

This is just sample code - make sure to handle errors properly etc.

Luaan
  • 62,244
  • 7
  • 97
  • 116
  • 1
    You also need to handle the case when the loop exists and there are data that is not yet written to the file. – Yacoub Massad Feb 15 '16 at 17:08
  • Doing `await` inside the loop defeats the whole purpose of async I/O -- to overlap other useful operations. – Ben Voigt Feb 15 '16 at 17:10
  • I don't think that will be any faster this way (the serialization and writing process, that is). You serialize some objects, then you write them out, but you don't continue to serialize other objects while this happens. The `await` just sits around doing nothing if no other work needs to be done, it won't "prefetch" the next iterations... – Haukinger Feb 15 '16 at 17:10
  • @YacoubMassad Yeah, that's one of the biggies :) I don't want this code to be something you copy-paste and have just work - it's something that needs thinking no matter what you do. For example, it's precisely tweaked to only work with multiples of `1024 * 4` worth of data, it doesn't care about aligning when that isn't the case anymore (which includes both your comment and the case when it's used for something more complex than a float array). – Luaan Feb 15 '16 at 17:10
  • @Haukinger Sure, that's another great optimization. Make sure to only await the task on the end of the *next* iteration. I'm sure you'll find plenty others. Don't forget that the file stream is buffered, so `WriteAsync` will tend to return immediately if the buffer isn't full yet, so there *is* overlapping. And if the buffering can't keep up, you're not going to improve your throughput anyway (though tweaking the buffer size is something you might want to do). – Luaan Feb 15 '16 at 17:11
1

Sometimes, these helper classes are anything but helpful.

Try this:

List<float> samples = GetSamples();

using (FileStream stream = File.OpenWrite("somefile.bin"))
{
    foreach (var sample in samples)
    {
        await stream.WriteAsync(BitConverter.GetBytes(sample), 0, 4);
    }
}
Paulo Morgado
  • 14,111
  • 3
  • 31
  • 59