20

How can I read an arbitrary file and process it "piece by piece" (meaning byte by byte or some other chunk size that would give the best read performance) without loading the entire file into memory? An example of processing would be to generate an MD5 hash of the file although the answer could apply to any operation.

I'd like to have or write this but if I can get existing code that would be great too.

(c#)

Howiecamp
  • 2,981
  • 6
  • 38
  • 59

5 Answers5

31

Here's an example of how to read a file in chunks of 1KB without loading the entire contents into memory:

const int chunkSize = 1024; // read the file by chunks of 1KB
using (var file = File.OpenRead("foo.dat"))
{
    int bytesRead;
    var buffer = new byte[chunkSize];
    while ((bytesRead = file.Read(buffer, 0, buffer.Length)) > 0)
    {
        // TODO: Process bytesRead number of bytes from the buffer
        // not the entire buffer as the size of the buffer is 1KB
        // whereas the actual number of bytes that are read are 
        // stored in the bytesRead integer.
    }
}
Darin Dimitrov
  • 1,023,142
  • 271
  • 3,287
  • 2,928
  • Please clarify why this code doesn't completely read the file into memory. Also please explain your TODO section. – Matthew Jul 28 '11 at 21:34
  • 4
    This loads 1KB (or chunkSize bytes) into memory. Edit: He also meant that not the whole `buffer` is written! Only bytes from index 0 to index `bytesRead - 1`. – Vercas Jul 28 '11 at 21:35
  • @Darin - Ignore my question in the first comment. I see that as a result of the file.Read that only the chunk # of bytes are read. – Howiecamp Jul 29 '11 at 18:35
  • 2
    @Darin I tried this code and it does *not read the last chunk correctly*. It keeps garbage values in the buffer if the last chunk is smaller than `chunkSize` – Mujeeb Sep 17 '18 at 11:20
  • @Mujeeb You need to read `bytesRead` length, not entire length of `buffer`, e.g. `fsFileStream.Write(buffer, 0, bytesRead)` – AntikM Oct 16 '19 at 06:28
11

System.IO.FileStream does not load the file into memory.
This stream is seekable and MD5 hashing algorithm doesn't have to load the stream(file) intro memory either.

Please replace file_path with the path to your file.

byte[] hash = null;

using (var stream = new FileStream(file_path, FileMode.Open))
{
    using (var md5 = new System.Security.Cryptography.MD5CryptoServiceProvider())
    {
        hash = md5.ComputeHash(stream);
    }
}

Here, your MD5 Hash will be stored in the hash variable.

Lajos Mészáros
  • 3,756
  • 2
  • 20
  • 26
Vercas
  • 8,931
  • 15
  • 66
  • 106
4
   int fullfilesize = 0;// full size of file
    int DefaultReadValue = 10485760; //read 10 mb at a time
    int toRead = 10485760;
    int position =0;

  //  int 
 //   byte[] ByteReadFirst = new byte[10485760];

    private void Button_Click(object sender, RoutedEventArgs e)
    {
        using (var fs = new FileStream(@"filepath", FileMode.Open, FileAccess.Read))
        {
            using (MemoryStream requestStream = new MemoryStream())
            {


                fs.Position = position;

                if (fs.Position >= fullfilesize)
                {
                    MessageBox.Show(" all done");
                    return;
                }
                System.Diagnostics.Debug.WriteLine("file position" + fs.Position);

                if (fullfilesize-position < toRead)
                {
                    toRead = fullfilesize - position;
                    MessageBox.Show("last time");
                }
                System.Diagnostics.Debug.WriteLine("toread" + toRead);
                int    bytesRead;
                byte[] buffer = new byte[toRead];
                int offset = 0;
                position += toRead;
                while (toRead > 0 && (bytesRead = fs.Read(buffer, offset, toRead)) > 0)
                {
                    toRead -= bytesRead;
                    offset += bytesRead;
                }

                toRead = DefaultReadValue;


            }
        }
    }

Copying Darin's , this method will read 10mb chunks till the end of the file

Sanath Shetty
  • 484
  • 1
  • 7
  • 24
  • Although the MemoryStream in your example is not required, you are the only one who posted an example where you set the FileStream Position. This has solved my issue where I needed to split and transfer large files in 10 meg chunks. Upvoted! – DragonZero Nov 15 '15 at 13:51
2
const int MAX_BUFFER = 1024;
byte[] Buffer = new byte[MAX_BUFFER];
int BytesRead;
using (System.IO.FileStream fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read))
    while ((BytesRead = fileStream.Read(Buffer, 0, MAX_BUFFER)) != 0)
    {
        // Process this chunk starting from offset 0 
        // and continuing for bytesRead bytes!
    }
M. Jahedbozorgan
  • 6,914
  • 2
  • 46
  • 51
1
const long numberOfBytesToReadPerChunk = 1000;//1KB
using (BinaryReader fileData = new BinaryReader(File.OpenRead(aFullFilePath))
    while (fileData.BaseStream.Position - fileData.BaseStream.Length > 0)
        DoSomethingWithAChunkOfBytes(fileData.ReadBytes(numberOfBytesToReadPerChunk));

As I understand the functions used here (specifically BinaryReader.ReadBytes), there is no need to track how many bytes you've read. You just need to know the length and current position for the while loop -- which the stream tells you.

Richard Barker
  • 1,161
  • 2
  • 12
  • 30