How do I decide how many bytes to read from an inputstream?

Question

I am trying to read from an InputStream. I wrote below code

byte[] bytes = new byte[1024 * 32];
                while (bufferedInStream.read(bytes) != -1) {
                    bufferedOutStream.write(bytes);
                }

What I don't understand is how many bytes I should read in an iteration? The stream contains a file saved on the disk.

I read here but I did not understand the post really.

As I understand, the post you referenced says if you are reading from the disk, you can use from 8 KB to 64 KB. If your file's size is smaller than 64 KB, you can read it by one iteration, or two most. — GokcenG, Sep 29 '14 at 08:58
The reason you use a buffer is that it is faster than reading one byte at a time. Which is an efficient buffer size depends on the what you are copying to from i.e. is it a Socket or file, or USB. Sizes of between 512 bytes and 64 KB tend to be efficient. e.g. sizes of more than 1 MB can be slower, than a smaller buffer. — Peter Lawrey, Sep 29 '14 at 13:28

Angus Comber · Accepted Answer · 2014-09-29T14:05:13.620

9

Say you had a flow of water from a pipe into a bath. You then used a bucket to get water from the bath and carry to say to your garden to water the lawn. The bath is the buffer. When you are walking across the lawn the buffer is filling up so when you return there is a bucket ful for you to take again.

If the bath is tiny then it could overflow while you are walking with the bucket and so you will lose water. If you have a massive bath then it is unlikely to overflow. so a larger buffer is more convenient. but of course a larger bath costs more money and takes up more space.

A buffer in your program takes up memory space. And you don't want to take up all your available memory for your buffer just because it is convenient.

Generally in your read function you can specify how many bytes to read. so even if you have a small buffer you could do this (pseudocode):

const int bufsize = 50;
buf[bufsize];
unsigned read;
while ((read = is.read(buf, bufsize)) != NULL) {
   // do something with data - up to read bytes
}

In above code bufzise is MAXIMUM data to read into the buffer.

If your read function does not allow you to specify a maximum number of bytes to read then you need to supply a buffer large enough to receive the largest possible read amount.

So the optimal buffer size is application specific. Only the application developer will know the characteristics of the data. Eg how fast is the flow of water into the bath. What bath size can you afford (embedded apps), how quickly can you carry bucket from bath across garden and back again.

edited Sep 29 '14 at 14:05

answered Sep 29 '14 at 09:02

Angus Comber

9,316
14
59
107

1

When reading the end of the file or from a socket, the size read will not match the buffer most of the time. i.e. you can't ignore it. – Peter Lawrey Sep 29 '14 at 13:26
1

@PeterLawrey true, I wasn't too rigorous in the pseudocode, I added a read in count - which could be instructive. – Angus Comber Sep 29 '14 at 14:06
@AngusComber I don't quite get the analogy where you would lose some water because of overflow. I mean in this situation, the underlying "io" should block itself not to lose anything regardless how much you read at one go, right? – stdout Dec 31 '18 at 11:41
@zgulser if the source is generating data faster than you can process it the source buffers will overflow. – Angus Comber Jan 03 '19 at 08:06
@AngusComber I got that part. the only thing I'm saying is - if that happens (since the buffer would be full) I thought the io operation would have been blocked. – stdout Jan 03 '19 at 08:39
@zgulser Imagine that you could not read the output of the source fast enough. Imagine the source is the roof of your house and it is raining hard. The pipe from the gutter to your drain is blocked or is a really thin pipe. The gutter gradually fills (a buffer) but it is of finite size, and the pipe is not taking away water fast enough. So the gutter overflows. Imagine an embedded system which sends data to a server. If the server doesn't retrieve the data then the buffer in the embedded system fills until it overflows in the same manner. – Angus Comber Jan 03 '19 at 12:52

talex · Answer 2 · 2020-01-29T03:24:15.713

5

It is depend on available memory, size of file and other stuff. You better make some measurement.

PS: You code is wrong. bufferedInStream.read(bytes) may not fill all buffer, but only part of it. This method return actual amount of bytes as result.

byte[] bytes = new byte[1024 * 32];
int size;
while ((size = bufferedInStream.read(bytes)) != -1) {
    bufferedOutStream.write(bytes, 0, size);
}

edited Jan 29 '20 at 03:24

answered Sep 29 '14 at 09:03

talex

17,973
3
29
66

score 2 · Answer 3 · answered Aug 01 '19 at 15:28

Here is my suggestion (assuming we are dealing with just input stream and not how we gonna write to output stream):

If your use case does not have any requirement for high read performance, go ahead with FileInputStream. For example:

FileInputStream fileInputStream = new FileInputStream("filePath");
byte[] bytes = new byte[1024];
int size;
while ((size = fileInputStream.read(bytes)) != -1) {
   outputStream.write(bytes, 0, size);
}

For better read performance, use BufferedInputStream and stick to its default buffer size and read single byte at a time. For example:

byte[] bytes = new byte[1];
BufferedInputStream bufferedInputStream = 
                       new BufferedInputStream(fileInputStream("filePath"))
int size;
while ((size = bufferedInputStream.read(bytes)) != -1) {
    outputStream.write(bytes, 0, size);
}

For more performance, try tuning the buffer size of BufferedInputStream and read one byte at a time. For example:

byte[] bytes = new byte[1];
BufferedInputStream bufferedInputStream = 
                       new BufferedInputStream(fileInputStream("filePath"), 16048)
int size;
while ((size = bufferedInputStream.read(bytes)) != -1) {
    outputStream.write(bytes, 0, size);
}

If you require even more, use buffer on top of BufferedInputStream. For example:

byte[] bytes = new byte[1024];
BufferedInputStream bufferedInputStream = 
                       new BufferedInputStream(fileInputStream("filePath"), 16048)
int size;
while ((size = bufferedInputStream.read(bytes)) != -1) {
    outputStream.write(bytes, 0, size);
}

score 0 · Answer 4 · answered Sep 29 '14 at 09:02

You basically have a byte container of the length you specified (1024*32)

Then, the inputStream will fill as much as possible, probably the full container, iteration throughout iteration until it reaches the end of the file when it will fill only the remaining bytes, and return -1 the next iteration (the one it cant read anything)

So you are basically copy&pasting from input to output in chunks of 1024*32 bytes size

Hope it helps you understand the code

By the way, the last iteration, if the input stream has less than 1024*32, the output will receive not only the last part of the file but also a repetition of the previous iteration contents for the bytes not filled it the last iteration.

score 0 · Answer 5 · edited May 23 '17 at 12:17

The idea is not to read the entire file contents at one time using the buffered input stream. You use the buffered input stream to read as many bytes as the bytes[] array size. You consume the bytes read and then move on to reading more bytes from the file. Hence you don't need know the file size in order to read it.

This post will be more helpful as it explains why you should wrap a fileinputstream with a buffered input stream

Why is using BufferedInputStream to read a file byte by byte faster than using FileInputStream?

How do I decide how many bytes to read from an inputstream?

5 Answers5