2

I'm a little bit confused aboot how i should read large file(> 8GB) by chunks in case each chunk has own size.

If I know chunk size it looks like code bellow:

using (FileStream fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read, ProgramOptions.BufferSizeForChunkProcessing))
{
    using (BufferedStream bs = new BufferedStream(fs, ProgramOptions.BufferSizeForChunkProcessing))
    {
        byte[] buffer = new byte[ProgramOptions.BufferSizeForChunkProcessing];
        int byteRead;
        while ((byteRead = bs.Read(buffer, 0, ProgramOptions.BufferSizeForChunkProcessing)) > 0)
        {
            byte[] originalBytes;
            using (MemoryStream mStream = new MemoryStream())
            {
                mStream.Write(buffer, 0, byteRead);
                originalBytes = mStream.ToArray();
            }
        }
    }
}

But imagine, I've read large file by chunks made some coding with each chunk(chunk's size after that operation has been changed) and written to another new file all processed chunks. And now I need to do the opposite operation. But I don't know exactly chunk size. I have an idea. After each chunk has been processed i have to write new chunk size before chunk bytes. Like this:

Number of block bytes
Block bytes
Number of block bytes
Block bytes

So in that case first what i need to do is read chunk's header and learn what is chunk size exactly. I read and write to file only byte arrays. But I have a question - how should look chunk's header ? May be header have to contain some boundary ?

isxaker
  • 8,446
  • 12
  • 60
  • 87
  • Possible duplicate of [Read text file block by block](http://stackoverflow.com/questions/17612853/read-text-file-block-by-block) – Simon Price Feb 15 '16 at 11:13
  • flagged as a duplicate - take a look at this where you should be able to get your answer http://stackoverflow.com/questions/17612853/read-text-file-block-by-block – Simon Price Feb 15 '16 at 11:13
  • But I must use `BufferedStream`. I mustn't use `StreamReader` – isxaker Feb 15 '16 at 11:16
  • 1
    You realise that `FileStream` already buffers the file, so using `BufferedStream` with it is pointless? – Matthew Watson Feb 15 '16 at 11:22
  • `BufferedStream` (like `FileStream`) is a `Stream`, and `StreamReader` reads from a `Stream`, so if you _must_ use one it does not mean you _can't_ use the other. – C.Evenhuis Feb 15 '16 at 11:28
  • when you say that you must use buffered stream who has told you that you must use this way? is this homework? – Simon Price Feb 15 '16 at 11:33
  • @MatthewWatson You point is that `FileStream` has already buffered in my case and `BufferedStream` is just a buffer over an existing stream, for example `MemoryStream` – isxaker Feb 15 '16 at 12:39
  • @SimonPrice It's not my homework surely. And I decided to work with stream of bytes that's why i don't use `StreamReader`. – isxaker Feb 15 '16 at 12:43

2 Answers2

8

If the file is rigidly structured so that each block of data is preceded by a 32-bit length value, then it is easy to read. The "header" for each block is just the 32-bit length value.

If you want to read such a file, the easiest way is probably to encapsulate the reading into a method that returns IEnumerable<byte[]> like so:

public static IEnumerable<byte[]> ReadChunks(string path)
{
    var lengthBytes = new byte[sizeof(int)];

    using (var fs = new FileStream(path, FileMode.Open, FileAccess.Read, FileShare.Read))
    {
        int n = fs.Read(lengthBytes, 0, sizeof (int));  // Read block size.

        if (n == 0)      // End of file.
            yield break;

        if (n != sizeof(int))
            throw new InvalidOperationException("Invalid header");

        int blockLength = BitConverter.ToInt32(lengthBytes, 0);
        var buffer = new byte[blockLength];
        n = fs.Read(buffer, 0, blockLength);

        if (n != blockLength)
            throw new InvalidOperationException("Missing data");

        yield return buffer;
    }
}

Then you can use it simply:

foreach (var block in ReadChunks("MyFileName"))
{
    // Process block.
}

Note that you don't need to provide your own buffering.

Matthew Watson
  • 104,400
  • 10
  • 158
  • 276
0

try this

public static IEnumerable<byte[]> ReadChunks(string fileName)
    {
        const int MAX_BUFFER = 1048576;// 1MB 

        byte[] filechunk = new byte[MAX_BUFFER];
        int numBytes;
        using (var fs = new FileStream(fileName, FileMode.Open, FileAccess.Read, FileShare.Read))
        {
            long remainBytes = fs.Length;
            int bufferBytes = MAX_BUFFER;

            while (true)
            {
                if (remainBytes <= MAX_BUFFER)
                {
                    filechunk = new byte[remainBytes];
                    bufferBytes = (int)remainBytes;
                }

                if ((numBytes = fs.Read(filechunk, 0, bufferBytes)) > 0)
                {
                    remainBytes -= bufferBytes;

                    yield return filechunk;
                }
                else
                {
                    break;
                }
            }
        }
    }
Sajeepan Y
  • 1
  • 1
  • 1
  • Welcome to stack overflow! Please consider adding a short explanation about your answer and how it solves the problem stated in the question. Have a good day! – D J Sep 14 '22 at 08:02