14

In every Java implementation I see of reading from a file, I almost always see a file reader used to read line by line. My thought would be that this would be terribly inefficient because it requires a system call per line.

What I'd been doing instead is to use an input stream and grab the bytes directly. In my experiments, this is significantly faster. My test was a 1MB file.

    //Stream method
    try {
        Long startTime = new Date().getTime();

        InputStream is = new FileInputStream("test");
        byte[] b = new byte[is.available()];
        is.read(b);
        String text = new String(b);
        //System.out.println(text);

        Long endTime = new Date().getTime();
        System.out.println("Text length: " + text.length() + ", Total time: " + (endTime - startTime));

    }
    catch (Exception e) {
        e.printStackTrace();
    }

    //Reader method
    try {
        Long startTime = new Date().getTime();

        BufferedReader br = new BufferedReader(new FileReader("test"));
        String line = null;
        StringBuilder sb = new StringBuilder();
        while ((line = br.readLine()) != null) {
            sb.append(line);
            sb.append("\n");
        }
        String text = sb.toString();

        Long endTime = new Date().getTime();
        System.out.println("Text length: " + text.length() + ", Total time: " + (endTime - startTime));

    }
    catch (Exception e) {
        e.printStackTrace();
    }

This gives a result of:

Text length: 1054631, Total time: 9
Text length: 1034099, Total time: 22

So, why do people use readers instead of streams?

If I have a method that takes a text file and returns a String that contains all of the text, is it necessarily better to do it using a stream?

Jeremy
  • 5,365
  • 14
  • 51
  • 80
  • Your code is not correct. It is not guaranteed that it will read the whole file, see the documentation of the read and available methods. – Milo Apr 22 '12 at 16:49
  • 1
    Had you tried your hands on [java.nio.File](http://docs.oracle.com/javase/7/docs/api/java/nio/file/Files.html) package's Files.readAllLines(...) method. – nIcE cOw Apr 22 '12 at 16:52
  • +1 for learned something new – panny Feb 07 '13 at 20:29

3 Answers3

9

You are comparing apples to bananas. Reading one line at a time is going to be less efficient even with a bufferedReader than grabbing data as fast as possible. Note that use of available is discouraged, as it is not accurate in all situations. I found this out myself when I started using cipher streams.

ControlAltDel
  • 33,923
  • 10
  • 53
  • 80
  • That's very interesting. Is available dangerous when reading from a plain text file that exists on the local file system? – Jeremy Apr 22 '12 at 16:59
  • @Jeremy It is never correct to use [`available`](http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#available()) to allocate a buffer for the entirety of a stream. – Jeffrey Apr 22 '12 at 17:07
  • @Jeffrey If you have it, I'd love to see any resources you have on that. Before now I had been using available quite happily without running into any issues. I believe you, but I wonder if there really is a situation where available is appropriate. – Jeremy Apr 22 '12 at 17:08
  • @Jeremy Read the documentation for [`available`](http://docs.oracle.com/javase/7/docs/api/java/io/InputStream.html#available()). I more or less quoted the second paragraph of the documentation in my last statement. – Jeffrey Apr 22 '12 at 17:10
  • @Jeffrey I read it. It says that "For an arbitrary input stream, don't dare use available to determine the amount of buffer needed." That doesn't imply that there isn't a use case where available is necessarily accurate. – Jeremy Apr 22 '12 at 17:11
  • 5
    @Jeremy The problem with `available` is that it can only return the number of bytes available *without blocking*. If you are 100% sure that your `InputStream`'s buffer contains your entire file and that your `InputStream` will return the correct number from `available`, then by all means use it. But if your file is larger than the `InputStream`s buffer or your `InputStream` does not return the correct number, using it will fail. – Jeffrey Apr 22 '12 at 17:14
3

FileReader is generally used in conjunction with a BufferedReader because frequently it makes sense to read a file line by line, specially if the file has a well-defined record structure where each record corresponds to a line.

Also, FileReader can simplify some of the work for dealing with character encodings and conversions, as stated in the javadocs :

Convenience class for reading character files. The constructors of this class assume that the default character encoding and the default byte-buffer size are appropriate ... FileReader is meant for reading streams of characters.

Óscar López
  • 232,561
  • 37
  • 312
  • 386
3

Try to increase BufferedReader buffer size. For example:

BufferedReader br = new BufferedReader(new FileReader("test"),2000000);

If you choose the right buffer size you will be faster.

Then in your sample with Reader you spend time filling the StringBuilder. You have to read file line by line if you need to process lines. But if you only need to read a text in a string then read bigger chunk of text with public int read(char[] cbuf) and write the chunks in a StringWriter initialized with a proper size.

Choose to use InputStream or Reader does not depends on performance. Generally you use Reader when you read text data, because with reader you can handle more easily the charset.

Another point, your code here

byte[] b = new byte[is.available()];
is.read(b);
String text = new String(b);

it is not correct. The documentation tells

Note that while some implementations of InputStream will return the total number of bytes in the stream, many will not. It is never correct to use the return value of this method to allocate a buffer intended to hold all data in this stream.

so pay attention, you need to fix it.

dash1e
  • 7,677
  • 1
  • 30
  • 35