44

I need to read a file one character at a time and I'm using the read() method from BufferedReader. *

I found that read() is about 10x slower than readLine(). Is this expected? Or am I doing something wrong?

Here's a benchmark with Java 7. The input test file has about 5 million lines and 254 million characters (~242 MB) **:

The read() method takes about 7000 ms to read all the characters:

@Test
public void testRead() throws IOException, UnindexableFastaFileException{

    BufferedReader fa= new BufferedReader(new FileReader(new File("chr1.fa")));

    long t0= System.currentTimeMillis();
    int c;
    while( (c = fa.read()) != -1 ){
        //
    }
    long t1= System.currentTimeMillis();
    System.err.println(t1-t0); // ~ 7000 ms

}

The readLine() method takes only ~700 ms:

@Test
public void testReadLine() throws IOException{

    BufferedReader fa= new BufferedReader(new FileReader(new File("chr1.fa")));

    String line;
    long t0= System.currentTimeMillis();
    while( (line = fa.readLine()) != null ){
        //
    }
    long t1= System.currentTimeMillis();
    System.err.println(t1-t0); // ~ 700 ms
}

* Practical purpose: I need to know the length of each line, including the newline characters (\n or \r\n) AND the line length after stripping them. I also need to know if a line starts with the > character. For a given file this is done only once at the start of the program. Since EOL chars are not returned by BufferedReader.readLine() I'm resorting on the read() method. If there are better ways of doing this, please say.

** The gzipped file is here http://hgdownload.cse.ucsc.edu/goldenpath/hg19/chromosomes/chr1.fa.gz. For those who may be wondering, I'm writing a class to index fasta files.

dariober
  • 8,240
  • 3
  • 30
  • 47
  • 11
    Please read up on how to write accurate Java benchmarks. – Louis Wasserman Dec 25 '16 at 20:14
  • 7
    @Louis Wasserman Admittedly I didn't care too much about being accurate in my benchmarks. JUnit and `currentTimeMillis()` are not ideal but I figured that a 8-10x time difference on a fairly big file is large enough to ask the question. – dariober Dec 25 '16 at 20:27
  • 2
    @dariober You might be better of using `public int read(char[] cbuf, int off, int len) throws IOException` instead of directly using `read` function of bufferdreader. Ultimately your goal is to find end of lines in a file. Though I have not tested it myself, but taking control of buffer in your hand shall probably give you a better result. – Tyagi Akhilesh Dec 25 '16 at 20:47
  • 6
    After a quick check: The test is *probably* (!) not only flawed, I assume it is **totally** flawed. Try running the `readLine` test **before** the `read` test, and see whether the timings are different. This might just be related to HDD caches or the JIT (For me, the time difference on an old, slow HDD is 1:7 during the first run, but about 1:2 in subsequent runs. So in fact, try running `testRead();testReadLine();testRead();testReadLine();testRead();testReadLine();` and tell us about the results...) – Marco13 Dec 25 '16 at 20:48
  • @Marco13 *Try running the readLine test before the read test* : I'm doing this in Eclipse and I tried a few times to close Eclipse and re-open it (is this enough to clear cache and everything?). I also use `nanoTime()` instead of `currentTimeInMillis()`. I find results stay pretty much the same even when I run readLine first. (Say 1:6 for readLine() vs read()). I'm using a mac laptop with SSD. – dariober Dec 25 '16 at 21:11
  • Try running the `readLine()` test *after* the `read()` test, or on a different file of the same size. What you are observing is undoubtedly the effect of cache priming. – user207421 Dec 25 '16 at 21:36

6 Answers6

37

The important thing when analyzing performance is to have a valid benchmark before you start. So let's start with a simple JMH benchmark that shows what our expected performance after warmup would be.

One thing we have to consider is that since modern operating systems like to cache file data that is accessed regularly we need some way to clear the caches between tests. On Windows there's a small little utility that does just this - on Linux you should be able to do it by writing to some pseudo file somewhere.

The code then looks as follows:

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Mode;

import java.io.BufferedReader;
import java.io.FileReader;
import java.io.IOException;

@BenchmarkMode(Mode.AverageTime)
@Fork(1)
public class IoPerformanceBenchmark {
    private static final String FILE_PATH = "test.fa";

    @Benchmark
    public int readTest() throws IOException, InterruptedException {
        clearFileCaches();
        int result = 0;
        try (BufferedReader reader = new BufferedReader(new FileReader(FILE_PATH))) {
            int value;
            while ((value = reader.read()) != -1) {
                result += value;
            }
        }
        return result;
    }

    @Benchmark
    public int readLineTest() throws IOException, InterruptedException {
        clearFileCaches();
        int result = 0;
        try (BufferedReader reader = new BufferedReader(new FileReader(FILE_PATH))) {
            String line;
            while ((line = reader.readLine()) != null) {
                result += line.chars().sum();
            }
        }
        return result;
    }

    private void clearFileCaches() throws IOException, InterruptedException {
        ProcessBuilder pb = new ProcessBuilder("EmptyStandbyList.exe", "standbylist");
        pb.inheritIO();
        pb.start().waitFor();
    }
}

and if we run it with

chcp 65001 # set codepage to utf-8
mvn clean install; java "-Dfile.encoding=UTF-8" -server -jar .\target\benchmarks.jar

we get the following results (about 2 seconds are needed to clear the caches for me and I'm running this on a HDD so that's why it's a good deal slower than for you):

Benchmark                            Mode  Cnt  Score   Error  Units
IoPerformanceBenchmark.readLineTest  avgt   20  3.749 ± 0.039   s/op
IoPerformanceBenchmark.readTest      avgt   20  3.745 ± 0.023   s/op

Surprise! As expected there's no performance difference here at all after the JVM has settled into a stable mode. But there is one outlier in the readCharTest method:

# Warmup Iteration   1: 6.186 s/op
# Warmup Iteration   2: 3.744 s/op

which is exaclty the problem you're seeing. The most likely reason I can think of is that OSR isn't doing a good job here or that the JIT is only running too late to make a difference on the first iteration.

Depending on your use case this might be a big problem or negligible (if you're reading a thousand files it won't matter, if you're only reading one this is a problem).

Solving such a problem is not easy and there are no general solutions, although there are ways to handle this. One easy test to see if we're on the right track is to run the code with the -Xcomp option which forces HotSpot to compile every method on the first invocation. And indeed doing so, causes the large delay at the first invocation to disappear:

# Warmup Iteration   1: 3.965 s/op
# Warmup Iteration   2: 3.753 s/op

Possible solution

Now that we have a good idea what the actual problem is (my guess is still all those locks neither being coalesced nor using the efficient biased locks implementation), the solution is rather straight forward and simple: Reduce the number of function calls (so yes we could've arrived at this solution without everything above, but it's always nice to have a good grip on the problem and there might have been a solution that didn't involve changing much code).

The following code runs consistently faster than either of the other two - you can play with the array size but it's surprisingly unimportant (presumably because contrary to the other methods read(char[]) does not have to acquire a lock so the cost per call is lower to begin with).

private static final int BUFFER_SIZE = 256;
private char[] arr = new char[BUFFER_SIZE];

@Benchmark
public int readArrayTest() throws IOException, InterruptedException {
    clearFileCaches();
    int result = 0;
    try (BufferedReader reader = new BufferedReader(new FileReader(FILE_PATH))) {
        int charsRead;
        while ((charsRead = reader.read(arr)) != -1) {
            for (int i = 0; i < charsRead; i++) {
                result += arr[i];
            }
        }
    }
    return result;
} 

This is most likely good enough performance wise, but if you wanted to improve performance even further using a file mapping might (wouldn't count on too large an improvement in a case such as this, but if you know that your text is always ASCII, you could make some further optimizations) further help performance.

Voo
  • 29,040
  • 11
  • 82
  • 156
  • 2
    readCharTest should be `readTest()` ? (I'll remove this comment soon) – Marco13 Dec 26 '16 at 01:16
  • Good news! I was able to reproduce your results but I think the noise introduced makes them largely invalid - you're measuring the cache clearing and you've added non-equivalent processing that overwhelms the thing being actually measured. I have two general criticisms - one is (the more 'opinion-based' one) is that this is not really a microbenchmark so the methodology itself is non-representative. The other one is, even if we accept the methodology, it's not that hard to come up with performance differences between 50% and 300% - i.e. these specific measurements are non-representative. – pvg Dec 26 '16 at 01:43
  • I'll try to write up my results tomorrow and post them. – pvg Dec 26 '16 at 01:44
  • @pvg In general, I think I know what you mean, and agree to some extent - as already mentioned in another comment: Someone is starting an application and loading a file. With `readLine`, this takes 3 seconds. With `read`, it takes 12 seconds. At this point, one could says that "nobody cares" about a carefully tweaked and crafted JMH benchmark that shows that, under some conditions, the difference is lower: **No offense** @Voo, I hope it's clear what I want to say. However, the quesiton was about "WHY". And in order to read ASCII data **fast**, a `FileChannel` may be the better choice anyhow... – Marco13 Dec 26 '16 at 02:32
  • I do want to throw one thing in there - using your precise identical setup, including your added processing but with a fast SSD instead of HD the results are this - http://i.imgur.com/xqJa984.png That is, reading char by char is actually faster. This is a clear indication that what you've actually measured is dominated by the performance of your particular i/o setup, further confounded by the processing you added. The equal times you got are not evidence for or against any particular explanation. They're simply chance. – pvg Dec 26 '16 at 02:34
  • @Marco13 I think actually Voo is dead right here and we unfortunately (with my help) got carried away with other things (factor of 51! 1/3 CPU cycle!). What matters is what we can measure and how we know whether we're measuring something accurate and meaningful. What model are we attempting to verify or disprove by said measurements, etc. So Voo put us on the right track. – pvg Dec 26 '16 at 02:43
  • @pvg Sure. E.g. I've been hesitant to run a benchmark, because I'm currently at an old Win32 PC with HDD, and knew that wouldn't bring insights. So in a broader sense, of course (as Voo said): "*have a valid benchmark before you start.*". But I think that the benchmark *for the asker* in this case *may* be embarassingly simple: An application once takes 3 seconds to start, and once it takes 12 seconds, and that's an undeniable fact. (Again, I'm **not** neglecting any efforts and insights here - I'm rather wondering *which* conclusions should be drawn from them...) – Marco13 Dec 26 '16 at 03:00
  • @Marco13 yep, that's my argument about the benchmark as well. Beyond the known platitude that the JVM is a complex and highly dynamic optimizing runtime my take is that we can conclude - a) that its main unit of optimization is the method b) it's clever enough to optimize a highly inefficient method to the near-equivalent of an efficient one so both the 'factor of a zillion' and '1/3cpu cycle' bear a kernel of truth c) for b to happen, the method needs to run more than once. Given a trivial choice, instead of thinking about all of this, you should just use the obviously more efficient variant. – pvg Dec 26 '16 at 03:12
  • 1
    @Marco13 so the simple answer is that while _eventually_ the jvm can make the read version fairly fast, it's wildly inefficient for all the obvious-seeming reasons. On first run, the performance hit is gigantic, in _everyone's_ numbers. So just don't use it. – pvg Dec 26 '16 at 03:18
  • While this is interesting it doesn't help me much (my fault: asked for WHY I should have asked for HOW). I edited my question to add more detail of my actual goal (see *Practical purpose*). Any feedback much appreciated. – dariober Dec 26 '16 at 08:45
  • @Marco13 But the first point at figuring out *how* to improve performance is to understand *why* it is happening. Sadly HotSpot does not have that finegrained compiler controls that allows us to force it to compile one method at first invocation so we are at an impasse. `-XComp` might be a valid solution for a small one-off program, otherwise the only real solution is to change the code. The most likely reason for the large time difference is the lock in interpreted mode, so reducing the number of locks will greatly improve performance. – Voo Dec 26 '16 at 09:59
  • @pvg The results are pretty stable and repeatable for me - with read being consistently faster on my SSD as well (with and without clearing the file cache). But yes it's certainly a rather hard situation to come up with a good setup. But in practice the solution should be clear: Using `read(char[])` will not complicate the code by much and should give the best performance by far. – Voo Dec 26 '16 at 10:14
  • 1
    @Voo When I remove your processing, `readline` consistently wins even when repeated, and as data size increases it wins by more. It, of course, utterly trashes `read` in the one-off case which, to my mind, actually represents the realistic use. Do you think what you've written in your answer - "The important thing when analyzing performance is to have a valid benchmark" matches the presented conclusion ("read and readline perform the same"). To me, neither the benchmark nor the data (we seem to agree at least on the variance of the data) is valid. – pvg Dec 26 '16 at 11:14
  • Your statement in the last paragraph may be important. I think that the performance can be improved *significantly* by using a `FileChannel` and/or `MappedByteBuffer`, also because the input file is known to be an ASCII file. It might thus be possible to skip all the expensive decoding methods (that first have to be digested by the JIT before they are becoming efficient), and just check whether a `ByteBuffer` contains the `>` or `\n` that the asker seems to be actually looking for. (But sure, this may be an inappropriate specialization compared to the original question...) – Marco13 Dec 26 '16 at 13:08
  • @pvg Interesting how utterly inefficient streams are in Java for short data. Just goes to show that it is hard to come up with a realistic benchmark. Makes readLine about as efficient as reading data into a larger buffer. Doesn't change the main thrust of the topic (readChars() profiting tremendously from JIT compilation) but still I'll see that I get around to updating it - traveling atm so not much time for that though, if you want you're welcome to change the code and update numbers. – Voo Dec 27 '16 at 12:52
3

So this is the practical answer to my own question: Don't use BufferedReader.read() use FileChannel instead. (Obviously I'm not answering the WHY I put in the title). Here's the quick and dirty benchmark, hopefully others will find it useful:

@Test
public void testFileChannel() throws IOException{

    FileChannel fileChannel = FileChannel.open(Paths.get("chr1.fa"));
    long n= 0;
    int noOfBytesRead = 0;

    long t0= System.nanoTime();

    while(noOfBytesRead != -1){
        ByteBuffer buffer = ByteBuffer.allocate(10000);
        noOfBytesRead = fileChannel.read(buffer);
        buffer.flip();
        while ( buffer.hasRemaining() ) {
            char x= (char)buffer.get();
            n++;
        }
    }
    long t1= System.nanoTime();
    System.err.println((float)(t1-t0) / 1e6); // ~ 250 ms
    System.err.println("nchars: " + n); // 254235640 chars read
}

With ~250 ms to read the whole file char by char, this strategy is considerably faster than BufferedReader.readLine() (~700 ms), let alone read(). Adding if statements in the loop to check for x == '\n' and x == '>' makes little difference. Also putting a StringBuilder to reconstruct lines doesn't affect the timing too much. So this is plenty good for me (at least for now).

Thanks to @Marco13 for mentioning FileChannel.

dariober
  • 8,240
  • 3
  • 30
  • 47
1

Java JIT optimizes away empty loop bodies, so your loops actually look like this:

while((c = fa.read()) != -1);

and

while((line = fa.readLine()) != null);

I suggest you read up on benchmarking here and the optimization of the loops here.


As to why the time taken differs:

  • Reason one (This only applies if the bodies of the loops contain code): In the first example, you're doing one operation per line, in the second, you're doing one per character. This this adds up the more lines/characters you have.

    while((c = fa.read()) != -1){
        //One operation per character.
    }
    
    while((line = fa.readLine()) != null){
        //One operation per line.
    }
    
  • Reason two: In the class BufferedReader, the method readLine() doesn't use read() behind the scenes - it uses its own code. The method readLine() does less operations per character to read a line, than it would take to read a line with the read() method - this is why readLine() is faster at reading an entire file.

  • Reason three: It takes more iterations to read each character, than it does to read each line (unless each character is on a new line); read() is called more times than readLine().

Community
  • 1
  • 1
Luke Melaia
  • 1,470
  • 14
  • 22
  • 2
    If java optimized away these loops, there would be no timing difference. – pvg Dec 25 '16 at 20:51
  • @pvg Please see the edit. `read` and `readLine` read the file differently. And they're still being called in the loops. – Luke Melaia Dec 25 '16 at 20:55
  • I don't think the empty loop matters. I put `if(line.contains(">")){ System.out.println(line); }` inside the loop of the readLine() test and `if(c == '>'){ System.out.println(c); };` inside the read(). Results stay the same. – dariober Dec 25 '16 at 21:00
  • The empty loop (actually, a loop in general) is only part of the problem. – Luke Melaia Dec 25 '16 at 21:02
  • 1
    I read the edit. This statement - 'Another thing, your bench marks are completely off. Java optimize away empty loops. ' is basically wrong. These loops have side effects and are not optimized away. The benchmark is rough but reasonable. – pvg Dec 25 '16 at 21:08
  • @pvg Java optimizes away **empty** loops. Read about it here(http://stackoverflow.com/questions/8817300/a-loop-with-an-empty-body-in-java). – Luke Melaia Dec 25 '16 at 21:10
  • @pvg "These loops have side effects" - Yes, the loops expression will always be checked (see reason two), but the loops body is optimized away. – Luke Melaia Dec 25 '16 at 21:12
  • 4
    I think you need to be more careful with terminology. To say a loop is "optimized away" normally means the entire loop - the termination condition check included. The code in OP's question has an empty body but the loop has a side effect that cannot be optimized away without changing the semantics of the code. It is meaningless to say that the "loops body is optimized away" since the loops body is empty to begin with; there is nothing to optimise away. – davmac Dec 25 '16 at 21:23
  • The loops you write are strictly equivalent to the ones the OP wrote; they're both loops over the empty statement. Nothing is "optimized away"; you've just written the same thing in different syntax. – jpmc26 Dec 26 '16 at 06:50
1

Thanks @Voo for the correction. What I mentioned below is correct from FileReader#read() v/s BufferedReader#readLine() point of view BUT not correct from BufferedReader#read() v/s BufferedReader#readLine() point of view, so I have striked-out the answer.

Using read() method on BufferedReader is not a good idea, it wouldn't cause you any harm but it certainly wastes the purpose of class.

Whole purpose in life of BufferedReader is to reduce the i/o by buffering the content. You can read here in Java tutorials. You may also notice that read() method in BufferedReader is actually inherited from Reader while readLine() is BufferedReader's own method.

If you want to use read() method then I would say you better use FileReader, which is meant for that purpose. You can read here in Java tutorials.

So, I think answer to your question is very simple (without going into bench-marking and all that explainations) -

  • Each read() is handled by underlying OS and triggers disk access, network activity, or some other operation that is relatively expensive.
  • When you use readLine() then you save all these overheads, so readLine() will always be faster than read(), may not be substantially for small data but faster.
hagrawal7777
  • 14,103
  • 5
  • 40
  • 70
  • 2
    As already mentioned in the comments: The goal behind the `Buffered` (!) reader is that it *buffers* some data. So repeated `read()` calls will *not* cause the bytes to be read from the disc one by one. Instead, it regularly reads "chunks" of data. You can even trace it down to see tha in both, the `read` and the `readLine` approach, the underlying `FileReader` is doing the same `read` calls, each reading 8192 bytes. – Marco13 Dec 26 '16 at 01:11
  • @Marco13 There are hell lot of comments in this post and I even didn't read a few, I did read answers though. If your point is that `read` also do some buffering then I am not sure, however I cannot rule out there there could be some optimization, but still the basics remains same about the purpose of `BufferedReader` and `FileReader` classes, and why `read` is slower than `readLine` - because of more i/o involved. – hagrawal7777 Dec 26 '16 at 01:26
  • @hagrawal You actually could rule that out incredibly easy by just checking out the first paragraph of the documentation (or a quick glance at the code). Although the name alone seems to be a dead giveaway - if a *Buffered*Reader does not buffer reads, what else would it do? – Voo Dec 27 '16 at 12:54
  • @Voo Looks like you either misread or misunderstood my answer, could you pinpoint which part made you write this - "*if a BufferedReader does not buffer reads, what else would it do?*", I would like to understand where do you feel I tried to convey that `BufferedReader` doesn't buffer and then eventually made you downvote my answer! – hagrawal7777 Dec 27 '16 at 13:46
  • 1
    @hagrawal "Each read() is handled by underlying OS and triggers disk access, network activity, or some other operation that is relatively expensive". This is not true. If you use a buffered reader every 8000 byte will cause one kernel call. But then the same is also true for read line. The only overhead readLine avoids is the multiple read() calls to read and include them in one (thereby avoiding acquiring the lock repeatedly, etc.) – Voo Dec 27 '16 at 13:55
  • @Voo Ok, thanks for the correction, I got your point now, my knowledge was more from tutorials and theory but after looking into `BufferedReader#read()` implementation I now know why you downovoted and accept it now. What I mentioned is correct from `FileReader#read()` v/s `BufferedReader#readLine()` standpoint, but then that was not the context, so thanks for correcting. – hagrawal7777 Dec 27 '16 at 18:25
0

It is not surprising to see this difference if you think about it. One test is iterating the lines in a text file, while the other is iterating characters.

Unless each line contains one character, it is expected that the readLine() is way faster than the read() method.(although as pointed out by the comments above, it is arguable since a BufferedReader buffers the input, while the physical file reading might not be the only performance taking operation)

If you really want to test the difference between the 2 I would suggest a setup where you iterate over each character in both tests. E.g. something like:

void readTest(BufferedReader r)
{
    int c;
    StringBuilder b = new StringBuilder();
    while((c = r.read()) != -1)
        b.append((char)c);
}

void readLineTest(BufferedReader r)
{
    String line;
    StringBuilder b = new StringBuilder();
    while((line = b.readLine())!= null)
        for(int i = 0; i< line.length; i++)
            b.append(line.charAt(i));
}

Besides the above, please use a "Java performance diagnostic tool" to benchmark your code. Also, readup on how to microbenchmark java code.

hackjutsu
  • 8,336
  • 13
  • 47
  • 87
n247s
  • 1,898
  • 1
  • 12
  • 30
  • 4
    This is not really a microbenchmark. The posters approach, however primitive, is not unreasonable for the timescales and time ratios involved. You can use the unix time command for this with decent confidence you're seeing a significant effect. – pvg Dec 25 '16 at 20:47
0

According to the documentation:

Every read() method call makes an expensive system call.

Every readLine() method call still makes an expensive system call, however, for more bytes at once, so there are fewer calls.

Similar situation happens when we make database update command for each record we want to update, versus a batch update, where we make one call for all the records.

simon4ok
  • 43
  • 1
  • 7