74

I was trying to read a file into an array by using FileInputStream, and an ~800KB file took about 3 seconds to read into memory. I then tried the same code except with the FileInputStream wrapped into a BufferedInputStream and it took about 76 milliseconds. Why is reading a file byte by byte done so much faster with a BufferedInputStream even though I'm still reading it byte by byte? Here's the code (the rest of the code is entirely irrelevant). Note that this is the "fast" code. You can just remove the BufferedInputStream if you want the "slow" code:

InputStream is = null;

    try {
        is = new BufferedInputStream(new FileInputStream(file));

        int[] fileArr = new int[(int) file.length()];

        for (int i = 0, temp = 0; (temp = is.read()) != -1; i++) {
            fileArr[i] = temp;
        }

BufferedInputStream is over 30 times faster. Far more than that. So, why is this, and is it possible to make this code more efficient (without using any external libraries)?

ZimZim
  • 3,291
  • 10
  • 49
  • 67

3 Answers3

133

In FileInputStream, the method read() reads a single byte. From the source code:

/**
 * Reads a byte of data from this input stream. This method blocks
 * if no input is yet available.
 *
 * @return     the next byte of data, or <code>-1</code> if the end of the
 *             file is reached.
 * @exception  IOException  if an I/O error occurs.
 */
public native int read() throws IOException;

This is a native call to the OS which uses the disk to read the single byte. This is a heavy operation.

With a BufferedInputStream, the method delegates to an overloaded read() method that reads 8192 amount of bytes and buffers them until they are needed. It still returns only the single byte (but keeps the others in reserve). This way the BufferedInputStream makes less native calls to the OS to read from the file.

For example, your file is 32768 bytes long. To get all the bytes in memory with a FileInputStream, you will require 32768 native calls to the OS. With a BufferedInputStream, you will only require 4, regardless of the number of read() calls you will do (still 32768).

As to how to make it faster, you might want to consider Java 7's NIO FileChannel class, but I have no evidence to support this.


Note: if you used FileInputStream's read(byte[], int, int) method directly instead, with a byte[>8192] you wouldn't need a BufferedInputStream wrapping it.

Sotirios Delimanolis
  • 274,122
  • 60
  • 696
  • 724
  • 1
    Aah I see, I should have checked the API first before asking. So it's simply an 8K internal buffer. That makes sense. Thanks. As for the "more efficient" part, it's not necessary, but I thought my code might have been overly redundant in some way. I guess it's not. – ZimZim Sep 03 '13 at 21:59
  • 15
    @user1007059 You're welcome. Note that if you used `FileInputStream`'s `read(byte[], int, int)` method directly instead, with a `byte[>8192]` you wouldn't need a `BufferedInputStream` wrapping it. – Sotirios Delimanolis Sep 04 '13 at 04:01
  • @SotiriosDelimanolis When to use `read()` byte by byte and when to use `read(byte[])` array of byte. As I think reading array is always better. then can you give me example where to use `read()` byte by byte OR `read(byte[])` array of byte. OR `BufferedInputStream`.? – Asif Mushtaq Apr 01 '16 at 13:47
  • @UnKnown Don't have a great example. Maybe the first byte contains some flag about the content of the file or some other metadata. I don't think anyone would ever read an entire file using `read()`. – Sotirios Delimanolis Apr 27 '16 at 19:04
  • FileChannel read and write are faster than any other approach.https://github.com/RedGreenCode/UVa/blob/master/Performance/WriteTest5.java – Harish Sep 26 '16 at 02:22
  • @SotiriosDelimanolis when you say `It still returns only the single byte (but keeps the others in reserve).` I believe you mean that BufferedInputStream will still read single byte at a time from buffer/memory ? If yes is there a way we can also read in chunks from memory like we can do with `read(byte[], int, int)` which can make it even faster ? – emilly May 29 '17 at 02:43
  • @emilly I'm not sure I understand your second question. `BufferedInputStream` will, behind the scene, maintain 8192 bytes read from the wrapped `InputStream`. When you call `read`, it will take the first unread of those bytes and return it. – Sotirios Delimanolis May 29 '17 at 05:42
  • @SotiriosDelimanolis As you said in your second comment that `FileInputStream's read(byte[], int, int) method directly instead, with a byte[>8192] you wouldn't need a BufferedInputStream wrapping it`, my interpretation is BufferedInputStream is basically reading in chunk instead of one byte at time which makes it faster. Right ? – emilly Jun 03 '17 at 11:08
  • 1
    @emily `BufferedInputStream` is faster when your code requests to read fewer bytes (not necessarily just one byte) than the buffer size each time. `BufferedInputStream` acts optimistically and reads more than what you need, so that, when you come back, it already has the next batch. – Sotirios Delimanolis Jun 03 '17 at 16:58
  • @SotiriosDelimanolis I have one question and I hope you give me an answer to it . In case if we use BufferedInputStream ,when the first chunk of bytes is stored in the internal buffer ? : 1)_when creating a BufferedInputStream and passing a FileInputStream as an argument ? or 2)_ when we call the read() method it bufferes a chunk of bytes and returns them one by one ? – Alaa Alsayed Nov 22 '22 at 19:30
  • @AlaaAlsayed I haven't looked at that code in a while, but it wouldn't be 1) IMO. It would likely delay it until the user actually called one of its read method. As long as there's bytes buffered, return those. Otherwise, go fetch some. – Sotirios Delimanolis Nov 22 '22 at 20:23
3

A BufferedInputStream wrapped around a FileInputStream, will request data from the FileInputStream in big chunks (512 bytes or so by default, I think.) Thus if you read 1000 characters one at a time, the FileInputStream will only have to go to the disk twice. This will be much faster!

usha
  • 28,973
  • 5
  • 72
  • 93
  • 3
    It might be [platform dependent](http://stackoverflow.com/questions/16973843/bufferedreader-default-buffer-size), but it's [**8192** on current Android](https://github.com/google/j2objc/blob/master/jre_emul/android/libcore/luni/src/main/java/java/io/BufferedInputStream.java#L44). – pevik Feb 23 '16 at 15:06
  • Same, 8K, for most all platforms. – Hovercraft Full Of Eels Jan 05 '18 at 03:06
1

It is because of the cost of disk access. Lets assume you will have a file which size is 8kb. 8*1024 times access disk will be needed to read this file without BufferedInputStream.

At this point, BufferedStream comes to the scene and acts as a middle man between FileInputStream and the file to be read.

In one shot, will get chunks of bytes default is 8kb to memory and then FileInputStream will read bytes from this middle man. This will decrease the time of the operation.

private void exercise1WithBufferedStream() {
      long start= System.currentTimeMillis();
        try (FileInputStream myFile = new FileInputStream("anyFile.txt")) {
            BufferedInputStream bufferedInputStream = new BufferedInputStream(myFile);
            boolean eof = false;
            while (!eof) {
                int inByteValue = bufferedInputStream.read();
                if (inByteValue == -1) eof = true;
            }
        } catch (IOException e) {
            System.out.println("Could not read the stream...");
            e.printStackTrace();
        }
        System.out.println("time passed with buffered:" + (System.currentTimeMillis()-start));
    }


    private void exercise1() {
        long start= System.currentTimeMillis();
        try (FileInputStream myFile = new FileInputStream("anyFile.txt")) {
            boolean eof = false;
            while (!eof) {
                int inByteValue = myFile.read();
                if (inByteValue == -1) eof = true;
            }
        } catch (IOException e) {
            System.out.println("Could not read the stream...");
            e.printStackTrace();
        }
        System.out.println("time passed without buffered:" + (System.currentTimeMillis()-start));
    }
huseyin
  • 1,367
  • 16
  • 19
  • The example is good. However it is absolutely not correct to check a time of execution - benchmarks with a such way. Use JMH for example to check properly. – Kirill Ch May 11 '21 at 15:39