9

For log processing my application needs to read text files line by line. First I used the function readLine() of BufferedReader but I read on the internet that BufferedReader is slow when reading files.
Afterwards I tried to use FileInputStream together with a FileChannel and MappedByteBuffer but in this case there's no function similar to readLine() so I search my text for a line-break and process it:

    try {
        FileInputStream f = new FileInputStream(file);
        FileChannel ch = f.getChannel( );
        MappedByteBuffer mb = ch.map(FileChannel.MapMode.READ_ONLY, 0L, ch.size());
        byte[] bytes = new byte[1024];
        int i = 0;
        while (mb.hasRemaining()) {
            byte get = mb.get();
            if(get == '\n') {
                if(ra.run(new String(bytes)))
                    cnt++;
                for(int j = 0; j<=i; j++)
                    bytes[j] = 0;
                i = 0;
            }
            else
                bytes[i++] = get;
        }
    } catch(Exception ex) {
        ex.printStackTrace();
    }

I know this is probably not a good way to implement it but when I just read the text-file in bytes it is 3 times faster then using BufferedReader but calling new String(bytes) creates a new String and makes the program even slower then when using a BufferedReader.
So I wanted to ask what is the fastest way to read a text-file line by line? Some say BufferedReader is the only solution to this problem.

P.S.: ra is an instance of RunAutomaton from the dk.brics.Automaton library.

Yoni
  • 325
  • 2
  • 7
  • 15
  • 1
    is BufferedReader really too slow for your needs? It is probably one of the cleanest, most maintainable solutions if you must code in Java. – jcomeau_ictx Apr 27 '11 at 06:45
  • If `BufferedReader` is really too slow for your application, you should think about not using java or other managed languages ... _(But i doubt this is the case)_ – ordag Apr 27 '11 at 12:28
  • 1
    [Aaron](http://stackoverflow.com/users/460201/aaron)'s answer is about to be deleted as a link only answer, so I'll put it here as a comment: "Check [this link](http://nadeausoftware.com/articles/2008/02/java_tip_how_read_files_quickly) out. It contains speed comparison of various methods." – Stewie Griffin Jul 14 '14 at 09:29

5 Answers5

19

I very much doubt that BufferedReader is going to cause a significant overhead. Adding your own code is likely to be at least as inefficient, and quite possibly wrong too.

For example, in the code that you've given you're calling new String(bytes) which is always going to create a string from 1024 bytes, using the platform default encoding... not a good idea. Sure, you clear the array afterwards, but your strings are still going to contain a bunch of '\0' characters - which means a lot of wasted space, apart from anything else. You should at least restrict the portion of the byte array the string is being created from (which also means you don't need to clear the array afterwards).

Have you actually tried using BufferedReader and found it to be too slow? You should usually write the simplest code which will meet your goals first, and then check whether it's fast enough... especially if your only reason for not doing so is an unspecified resource you "read on the internet". DO you want me to find hundreds of examples of people spouting incorrect performance suggestions? :)

As an alternative, you might want to look at Guava's overload of Files.readLines() which takes a LineProcessor.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 1
    I have tried BufferedReader and it is performing good but the requirement of the program is to be really fast so I'm just trying to find out which solution is the fastest en best one for my implementation. – Yoni Apr 27 '11 at 06:57
  • 2
    @Yoni: "really fast" is a pretty vague requirement. Do you even have any evidence that it's `BufferedReader` which is the bottleneck rather than (much more likely) the physical disk speed? – Jon Skeet Apr 27 '11 at 06:59
  • If I read the same files in bytes it is 3 times faster then using `BufferedReader`. My hard disk speed is about 150mb/s while my program reads at 30mb/s. – Yoni Apr 27 '11 at 07:07
  • @Yoni: Hmm... that's somewhat surprising. What encoding are you using, and what's your machine spec? Are you running in the debugger, or doing anything else that might be slowing it down? Are you *using* the strings at all? – Jon Skeet Apr 27 '11 at 07:18
  • I'm using the windows-1258 encoding because the files contain vietnamese characters. I'm using the strings to find a regex-pattern in it. They are given to the RunAutomaton instance. My machine specs: C2D P8700, 4GB DDR2, HD: 500GB(Sata2) 7200RPM. I'm not running in the debugger and it is not due to the regex that it is slow because when commenting the RunAutomaton the reading is at the same speed. – Yoni Apr 27 '11 at 07:27
  • It looks like the only suitable solution is to use a BufferedReader. Because you gave the most helpfull and complete information I accept your answer. – Yoni Apr 27 '11 at 08:04
  • @Yoni: It's very odd for the regex not to have any impact... One thing to bear in mind is that for odd reasons, specifying a `Charset` is slower than specifying the *name* of an encoding, so that might help you. – Jon Skeet Apr 27 '11 at 08:51
4

Using plain BufferedReader I got 100+ MB/s. It is highly likely that the speed you can read the data from disk is your bottle neck, so how you do the reading won't make much difference.

BufferedReader is not the only solution, but it is fast enough for 99% of use cases, so why make things more complicated than they need to be?

a113nw
  • 1,312
  • 1
  • 16
  • 26
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
1

Are frameworks an alternative?

I dont know about the performance, but

http://commons.apache.org/io/

http://commons.apache.org/io/api-release/index.html See IOUtils class

defines very easy to use helper classes for such cases.

Omnaest
  • 3,096
  • 1
  • 19
  • 18
0

i have a very simple loop that reads about 2000 lines (50k bytes) from a file on the sdcard using BufferedReader and it reads them all in about 100mS in debug mode on galaxy tab 2. not too bad. then i put a Scanner in the loop and the time went through the roof (tens of seconds), plus lots of GC_CONCURANT messages

Scanner scanner = new Scanner(line);
int eventType = scanner.nextInt(16);

so at least in my case its the Scanner that's the problem, i guess i need to scan the ints another way, but i have no idea why it could be so slow

steveh
  • 1,352
  • 2
  • 27
  • 41
0

According to this SO post, you might also want to give the Scanner class a shot.

Community
  • 1
  • 1
npinti
  • 51,780
  • 5
  • 72
  • 96