11

I have a question and I can't find a reason for it. I'm creating a custom archive file. I'm using MemoryStream to store data and finally I use a FileStream to write the data to disk.

My hard disk is an SSD, but the speed was too slow. When I tried to write only 95 MB to a file, it took 12 seconds to write!

I tried Filestream.Write and File.WriteAllBytes but it's the same.

At the end I got an idea to do it with copying and it was 100x faster!

I need to know why this is happening and what's wrong with the write functions.

Here's my code:

//// First of all I create an example 150MB file
Random randomgen = new Random();
byte[] new_byte_array = new byte[150000000];
randomgen.NextBytes(new_byte_array);

//// I turned the byte array into a MemoryStream
MemoryStream file1 = new MemoryStream(new_byte_array);
//// HERE I DO SOME THINGS WITH THE MEMORYSTREAM


/// Method 1 : File.WriteAllBytes | 13,944 ms
byte[] output = file1.ToArray();
File.WriteAllBytes("output.test", output);

// Method 2 : FileStream | 8,471 ms
byte[] output = file1.ToArray();
FileStream outfile = new FileStream("outputfile",FileMode.Create,FileAccess.ReadWrite);
outfile.Write(output,0, output.Length);

// Method 3 | FileStream | 147 ms !!!! :|
FileStream outfile = new FileStream("outputfile",FileMode.Create,FileAccess.ReadWrite);
file1.CopyTo(outfile);

Also, file1.ToArray() only takes 90 ms to convert the MemoryStream to bytes.

Why is this happening and what is the reason and logic behind it?

Boann
  • 48,794
  • 16
  • 117
  • 146
  • Is this repeatable? What happens if you wrap-up the whole thing in a loop of _n_ iterations. Is there a trend? Probably should add some `using ()`s in there –  Feb 11 '19 at 07:31
  • 5
    Are you sure it's the `Write`s that are taking the time vs. the `ToArray` call? You seem to be throwing a lot of large objects around and maybe that `ToArray` is the straw that breaks the camels back and forces a large GC to occur? Also, are you ensuring that you're resetting the `MemoryStream`s position back to the start before using method 3? (Many people forget that many methods, like `CopyTo` work from the stream's *current* position forwards) – Damien_The_Unbeliever Feb 11 '19 at 07:32
  • i think you should refer this link https://stackoverflow.com/questions/14587494/writing-to-file-using-streamwriter-much-slower-than-file-copy-over-slow-network – kunals Feb 11 '19 at 07:38
  • @Damien_The_Unbeliever yes , I'm pretty sure I tested it over and over again , toarray takes only 70-90ms , yes I reset positions to 0 before doing method3 and result of method 3 and other methods are exactly the same file with same hash. –  Feb 11 '19 at 07:38
  • 2
    Great answer here: https://stackoverflow.com/questions/3033771/file-i-o-with-streams-best-memory-buffer-size – Cosmin Sontu Feb 11 '19 at 10:15

1 Answers1

4

Update

Dmytro Mukalov has right. The performances you gain by extending FileStream internal buffer will be taken away when you do actual Flush. I dig a bit deeper and did some benchmark and it seems that the difference between Stream.CopyTo and FileStream.Write is that Stream.CopyTo use I/O buffer smarter and boost performances by copying chunk by chunk. At the end CopyTo use Write under the hood. The optimum buffer size has been discussed here.

Optimum buffer size is related to a number of things: file system block size, CPU cache size, and cache latency. Most file systems are configured to use block sizes of 4096 or 8192. In theory, if you configure your buffer size so you are reading a few bytes more than the disk block, the operations with the file system can be extremely inefficient (i.e. if you configured your buffer to read 4100 bytes at a time, each read would require 2 block reads by the file system). If the blocks are already in cache, then you wind up paying the price of RAM -> L3/L2 cache latency. If you are unlucky and the blocks are not in cache yet, you pay the price of the disk->RAM latency as well.

So to answer your question, in your case you are using unoptimized buffer sizes when using Write and optimized when you are using CopyTo or better to say Stream itself will optimize that for you.

Generally, you could force also unoptimized CopyTo by extending FileStream internal buffer, in that case, the results should be comparaably slow as unoptimized Write.

FileStream outfile = new FileStream("outputfile",
    FileMode.Create, 
    FileAccess.ReadWrite,
    FileShare.Read,
    150000000); //internal buffer will lead to inefficient disk write
file1.CopyTo(outfile);
outfile.Flush(); //don't forget to flush data to disk

Original

I did the analysis of the Write methods of the FileStream and MemoryStream and the point there is that MemoryStream always use an internal buffer to copy data, and it is extremely fast. The FileStream itself has a switch if the requested count >= bufferSize, which is true in your case as you are using default FileStream buffer, the default buffer size is 4096. In that case FileStream doesn't use buffer at all but native Win32Native.WriteFile.

The trick is to force FileStream to use the buffer by overriding the default buffer size. Try this:

// Method 2 : FileStream | 8,471 ms
byte[] output = file1.ToArray();
FileStream outfile = new FileStream("outputfile",
    FileMode.Create,
    FileAccess.ReadWrite, 
    FileShare.Read,
    output.Length + 1); // important, the size of the buffer
outfile.Write(output, 0, output.Length);

n.b. I do not say it is optimal buffer size just an explanation what is going on. To examine the best buffer size using FileStream refer to, link.

Johnny
  • 8,939
  • 2
  • 28
  • 33
  • The answer is rather misleading because the problem isn't related to the `FileStream` internal buffer. In your example the `WriteCall` will be executing fast enough only because it will merely copy data to the internal buffer but it doen't eliminate performance penalty when data will pushed to the file by `Flush` call. The actual problem is about of how much data is passed to the `Write` call (`CopyTo` does it by small portions) because it has influence on the I/O caching behavior. – Dmytro Mukalov Feb 11 '19 at 17:32
  • thanks for information , so the main reason of speed is buffer size right ? –  Feb 12 '19 at 08:57
  • @MikeTheCoder yes because `CopyTo` at the end invoke `Write`, it is in nutshell the same call... – Johnny Feb 12 '19 at 09:33
  • 1
    @Johnny thanks for update, @MikeTheCoder, actually the size of data passed to the `Write` calls - according to my experiments it starts being worse with 50MB size of data chunks and more. I think the reason is "context" problem - i.e. I/O cache manager doesn't know the context (whether it single `Write` call or series of calls) - it just tries to optimize an operation from the performance and data safety standpoints. With large portion of data high chances that data will create pressure on the cache (the dirty data cannot be off-loaded) so it just "decides" to not cache big chunks. – Dmytro Mukalov Feb 12 '19 at 11:05
  • @DmytroMukalov Thank you for great information , @ Johnny Also I accepted your answer. –  Feb 12 '19 at 12:11