What does Filestream.Read return value mean? How to read data in chunks and process it?

Question

I'm quite new to C# so please bear with me. I'm reading (using FileStream) data (fixed size) to small array, process the data and then read again and so on to the end of file.

I thought about using something like this:

            byte[] data = new byte[30];
            int numBytesToRead = (int)fStream.Length;
            int offset = 0;

            //reading
            while (numBytesToRead > 0)
            {
                fStream.Read(data, offset, 30);
                offset += 30;
                numBytesToRead -= 30;

                //do something with the data
            }

But I checked documentation and their examples and they stated that return value of the above read method is:

"Type: System.Int32 The total number of bytes read into the buffer. This might be less than the number of bytes requested if that number of bytes are not currently available, or zero if the end of the stream is reached."

What does it mean that they are not currently available, can this really happen when reading small amounts of data or is this just for large amounts? If only for large, how large approximately, because I'll be reading also in bigger chunks in some other places. If this can happen anytime how should I change my code so that the code will still execute efficiently?

Thank you for your time and answers.

score 5 · Accepted Answer · answered Feb 22 '11 at 09:22

5

The read method returns the number of bytes returned, which may be less than the number of bytes requested. Normally when you read a file, you will get all the bytes that you ask for (unless you reach the end of the file), however, you can't count on it always being that way.

It's possible that the system will make a difference between data that is immediately available and data that needs time to be retrieved, so that it will return the data currently available right away, start reading more data in the background and expect you to request the rest of the data in another call. AFAIK it doesn't do this currently, but it's a reasonable future scenario.

You should get the result of the Read method and use that to determine how much data you got. You shouldn't read it into the buffer at the location of offset, then you can't read a file that is larger than the buffer. Alternatively, you can declare an array to hold the entire stream, then you would read the data into the location of offset.

You should also handle the situation where the Read method returns zero, which means that there is no more data to read. This normally doesn't happen until you reach the end of the file, but if it would it would throw your code into an eternal loop.

byte[] data = new byte[30];
int numBytesToRead = (int)fStream.Length;
int offset = 0;

//reading
while (numBytesToRead > 0) {
  int len = fStream.Read(data, 0, data.Length);
  offset += len;
  numBytesToRead -= len;
  if (len == 0 && numBytesToRead > 0) {
    // error: unexpected end of file
  }
  //do something with the data (len bytes)
}

answered Feb 22 '11 at 09:22

Guffa

687,336
108
737
1,005

Thank you for your answer. So you are saying that, when reading from disk I should also check if len == data.Length ? It could happen that I wouldn't get all the required data? How would I then change the code? In case that I don't get all the bytes: Change the offset to len and insted of data.length use data.Length - len? – Ben Feb 22 '11 at 09:41
1

@Ben: It's not really interresting if len == data.Length, len contains the number of bytes that you got, regardless of how large the buffer is. – Guffa Feb 22 '11 at 09:58
Yes but I want to read the desired amount of bytes to an array (which has same size as one chunk of data) and then do something with data and then again read the same amount of data to same array. Thus I thought that I should compare the desired amount of bytes (data.length) with the actual amount (len). But I don't really know how to then get just the missing bytes from this chunk and process it and then continue reading normally. – Ben Feb 22 '11 at 10:16
@Ben: Then you need a loop in the loop. For the inner loop you keep an offset in the buffer, and call `.Read(data, offset, data.Length - offset)` until you fill the buffer. – Guffa Feb 22 '11 at 12:17
Thanks... But as you said for now this is not necessarily. Do professional products and other developers use this way of checking if the buffer is full? Or is this just overhead for now and nobody uses this? – Ben Feb 22 '11 at 13:05
1

@Ben: I hope that most professional products use the methods as they are defined in the documentation, and not rely on that they always behave the same and nothing ever goes wrong. The stability of an application depends on how it stands up to less ideal situations. I've seen applications run fine for years, and suddenly break down when some insignificant change revealed a serious flaw that was there all along. – Guffa Feb 22 '11 at 19:44
This example seems incomplete. I don't understand what I'm supposed to do with offset. The value is assigned, but never used. – Ristogod Feb 10 '22 at 16:53
@Ristogod: The part that is incomplete is where it says "do something with the data". The `offset` variable contains the position of that data relative to the start of the file. – Guffa Feb 27 '22 at 02:07
@Guffa not sure that helps me understand. – Ristogod Mar 09 '22 at 21:09
@Ristogod: For example, before the loop declare an array `byte[] result = new byte[numBytesToRead];`. At the comment in the loop you put the data in the array: `Array.copy(data, 0, result, offset, len)`, and move the line `offset += len;` after that. – Guffa Mar 11 '22 at 08:31

score 3 · Answer 2 · answered Feb 22 '11 at 08:56

Try reading more than is available in the file. You can do this in the following two scenarios:

You try to read more bytes than the total length of the file
You are too close to the end of the file to be able to read the number of bytes you request

Additionally, Stream has descendants for network-bound connections as well, and in those cases it is not always easy to know how many bytes will be available and when.

The way to process a binary file in chunks is like this:

byte[] buffer = new byte[BUFFER_SIZE];
int inBuffer;
while ((inBuffer = stream.Read(buffer, 0, buffer.Length)) > 0)
{
    // here you have "inBytes" number of bytes in the buffer
}

Thanks, I understand what you said, but I just want to be sure, t when reading from disk, do I have to check (inside while loop) for inBytes value, I mean, can it happen that it is less than buffer.Length (if I always read in "special" chunks (chunk_size * numberOfReads = file_size)? — Ben, Feb 22 '11 at 09:09

Daniel Hilgarth · Answer 3 · 2014-07-07T12:17:45.883

1

FileStream derives from Stream, and Stream is a very generic class and the description of Read is from that generic class. A stream can also be a network stream for example, and there, data might not be currently available, because it has not been send. For a FileStream you can assume, that you will get three types of return values:

return value == count of bytes to be read (last parameter of Read): You are in the middle of the file
return value < count && return value > 0: You might be at the end of the file or the rest of the stream is just not currently available.
return value == 0: You already read all content. Nothing more to read.

edited Jul 07 '14 at 12:17

answered Feb 22 '11 at 08:57

Daniel Hilgarth

171,043
40
335
443

You answer is wrong. MSDN documentation says "An implementation is free to return fewer bytes than requested even if the end of the stream has not been reached". See: http://msdn.microsoft.com/en-us/library/system.io.filestream.read.aspx So you **always** have to check the return value of Read() operations. – data Jul 07 '14 at 11:47
@data: As far as I can see, you just quoted the general description of `Stream.Read`. `FileStream.Read` has this description: "Return value: The total number of bytes read into the buffer. This might be less than the number of bytes requested if that number of bytes are not currently available, or zero if the end of the stream is reached." To me, this reads the same as my answer – Daniel Hilgarth Jul 07 '14 at 11:56
I quoted the "Remarks" section of the FileStream.Read method (v4.5). "if that number of bytes are not currently available" can be any reason, not just "end of stream". For example if the hardware is busy, or if the underlying driver returned less that was asked for. – data Jul 07 '14 at 12:01
@data: Changed my answer. Do you agree now? – Daniel Hilgarth Jul 07 '14 at 12:17
Hilgrath, I have checked the .NET source code and it is using `ReadFile` from kernel32.dll, which can return less than requested. There is a lot of broken code that relies upon `Read()` calls returning everything that was asked for. – data Jul 07 '14 at 12:23
@data: That doesn't answer my question. Do you agree with my updated answer? – Daniel Hilgarth Jul 07 '14 at 12:37

score 1 · Answer 4 · answered Feb 22 '11 at 09:05

Bytes currently not available only applies to non-FileStream Streams such as the one found in HttpWebRequest.

FileStream.Read could return 1 byte, in theory. You should still be able to process packets this small.

But it will never return 0 unless there is a problem like SMB connection lost, file deleted, anti virus, or it hits the end of the file.

There are better ways to read files. If you're dealing with a text file, consider using System.IO.StreamReader instead, as it handles different text encoding, line breaks, and more.

Also be aware that buffer max size is 2 GB, so don't do new buffer[fileStream.Length]

What does Filestream.Read return value mean? How to read data in chunks and process it?

4 Answers4

Linked