0

I was running the code below to try and understand how BufferedInputStream works in Java. I set the buffer size to 1 and was expecting the buffer to read the file 465 times because that is how much character is in the file. However, it reads the file once. What I found to change the number of times the buffer reads the file, you change the array of bytes, does, size to 1. In this case it reads the file 465 times. I do not understand why buffer reads the file once even though I set the buffer size 1. How come the array "does" dictates how many times the buffer reads the file?

 File f = new File("runs");
    
    if(!f.exists()) {
        f.createNewFile();
    }
    

    
    FileInputStream input = new FileInputStream(f);
    

    BufferedInputStream b = new BufferedInputStream(input, 1);
    
    byte[] does = new byte[1000];
    
    int i = b.read(does);
    
    int x = 0;
    String tmp;
    while(i != -1) {
        tmp = new String(does, StandardCharsets.UTF_8);
        if(!tmp.equalsIgnoreCase("\n")) {
            System.out.print(tmp);
        }else {
            System.out.println(tmp);
        }

        x++;
        i = b.read(does);
    }
    System.out.println(x);
    
    
    
}
kaka
  • 597
  • 3
  • 5
  • 16
  • Read the Javadoc for `InputStream.read(byte[])`. It reads from the stream until the buffer passed to `read` is full, or EOF is reached. The buffer in a `BufferedInputStream` is a completely different buffer, used to improve performance by reducing the number of reads performed on the file. – tgdavies Jul 27 '23 at 13:45
  • @tgdavies Let me add a small but important detail: it can read until the buffer is full and usually does but it is not guaranteed. Even before reaching EOF. The return value has to be respected to check how many bytes were actually read. That's also why libraries like Apache Commons IO provide helper functions like [IOUtils.readFully](https://commons.apache.org/proper/commons-io/apidocs/index.html?org/apache/commons/io/package-summary.html) – Paul Pazderski Jul 27 '23 at 15:21
  • 1
    @tgdavies is correct. How come the array "does" dictates how many times the buffer reads the file? This is because b.read(does) is equivalent to b.read(does, 0, does.length). This means if does.length=1000, it will try to read 1000 bytes if available. – user9035826 Jul 27 '23 at 15:21
  • Here is a question and answer on _BufferedInputStream_ use cases, which may shed some light on why it's set-up that way. _https://stackoverflow.com/questions/3122422/usage-of-bufferedinputstream_. – Reilas Jul 27 '23 at 18:36

2 Answers2

1

Differences of read() method within BufferedInputStream and InputStream.

Let's begin by InputStream.read which reads a single byte of data from the input stream and returns it as an int value which will be blocked in 2 condition, the end of the stream is being detected or an exception is thrown. While BufferedInputStream adds buffering to the passed input stream.

The difference is BufferedInputStream reads data from the underlying input stream in chunks and stores it in an internal buffer so when you call read() method it returns the next byte from its buffer instead so the difference is amount of data call overhead in which the BufferedInputStream reduce it by grouping multiple requests for data into a fewer calls from the underlying input stream.

Why BufferedInputStream reads the entire file even when specifying buffer size ?

It will not actually, BufferedInputStream does not necessarily read the entire file into the buffer even when a buffer size is specified actually it reads data from the file into the buffer in chunks or blocks whose size is at most the size of the buffer,The number of times the file is read depends on the size of the file and the size of the buffer used by, in you shared snippet you specified size of the buffer to 1 and thats the reason you get one byte at a time, which should different, in your case some thing like

BufferedInputStream b = new BufferedInputStream(input, 1024);
Lunatic
  • 1,519
  • 8
  • 24
  • " you specified size of the buffer to 1 and thats the reason you get one byte at a time" But it doesn't get it one byte at a time. IT reads the entire file that consists of 464 byte all at once. – kaka Jul 27 '23 at 19:37
0

Note that BufferedInputStream does not always read into it's own internal buffer.

If you view the source code of the constructor you will see that when you allocate as new BufferedInputStream(input, 1) internally it creates a internal buffer buf = new byte[1]; :

BufferedInputStream b = new BufferedInputStream(input, 1);
// Internally causes b.buf = new byte[1];

Nothing has been read from underlying input file at the point you run your next lines:

byte[] does = new byte[1000];
int i = b.read(does);

If you view the source of read(byte[]ba) you will find it calls BufferedInputStream.read(byte[] ba, int off, int len). Normally that read would use the contents of internal buf if available and copies buf into ba. But in your first read, buf is empty and so it uses the buffer which is largest of buf.length / len instead. So that means that buf is either filled and is copied to ba or, as in your case (1000 >= 1), so it fills does and does not fill the internal buf.

Thus the first access to the underlying file could be up to 1000 bytes (if available from input) even though you have specified the internal buffer size as just 1 byte.

By the way you shouldn't convert UTF8 stream as above because you may be corrupting the input stream part way through a UTF-8 character encoding - obviously use Reader instead.

DuncG
  • 12,137
  • 2
  • 21
  • 33
  • So it compares the internal buff to "does" and because "does" is 1000 it reads all the file? What is the purpose of the the internal buff if the buff in read dictates everything? Also, in this case I see no advantage of having BufferInputStream over FileInputStream because both can use the buffer in method read(byte[]ba). – kaka Jul 27 '23 at 19:32
  • Correct. There is no point having BufferedInputStream if you always read directly to a byte[] that is larger than the default buffer size, in which case you could access `input` directly. – DuncG Jul 27 '23 at 20:05
  • Note that reading a character at a time such as `int ch = b.read()`, or reading into arrays smaller than internal buffer will use the internal buffer to get the benefit of buffering. – DuncG Jul 28 '23 at 08:29