Read a file in reverse order line by line without much overhead

Question

I need read i file in reverse order, because now it must run all file to find something that I know it`ll be in the last lines. My objective is to make my application Faster Which is the FASTEST way to read a file line by line in reverse order?

For example: My file is

  line1
  line2
  line3
  line4
  line5

I want to read

  line5
  line4
  line3
  line2
  line1

I know there`s a lot of ways to do it... but which one will give me less overhead?

http://stackoverflow.com/questions/6011345/read-a-file-line-by-line-in-reverse-order — Melih Altıntaş, Sep 12 '13 at 12:06
Try this answer http://stackoverflow.com/questions/8664705/how-to-read-file-from-end-to-start-in-reverse-order-in-java — René Link, Sep 12 '13 at 12:07
You will probably find a place to start in the code of `BufferedReader#readLine()` — Arnaud Denoyelle, Sep 12 '13 at 12:08
You can also look at an implementation of ReverseLineInputStream in this thread : http://stackoverflow.com/questions/8664705/how-to-read-file-from-end-to-start-in-reverse-order-in-java — Arnaud Denoyelle, Sep 12 '13 at 12:12
my question is which way is the fastest? i know there`s a lot of ways to do it... but which one will give me less overhead? — LeandroC, Sep 12 '13 at 12:12
@LeandroC : What about [RandomFileAccess](http://docs.oracle.com/javase/6/docs/api/java/io/RandomAccessFile.html)? — boxed__l, Sep 12 '13 at 12:15
@LeandroC That's not a good SO question then. Try, see, benchmark. It's really hard to give an answer because file IO is meant to be done sequentially, with buffering. My guess is a good way to do this would involve reimplementing a good chunk of `BufferedReader` to work in reverse. — millimoose, Sep 12 '13 at 12:26
@LeandroC : Check [this](http://docs.oracle.com/javase/tutorial/essential/io/file.html) out for different methods.etc — boxed__l, Sep 12 '13 at 12:29
@LeandroC A naive "low-overhead" implementation (one that reads characters one-by-one back-to-front) might in fact end up being slower than reading the file with buffering, and whether or not that's the case would depend on the file size. — millimoose, Sep 12 '13 at 12:32
Oh and there's also the fun difference between byte streams and character streams. Reading a UTF-8 encoded file brings yet more edge cases. (Multibyte chars across IO chunk boundaries...) — millimoose, Sep 12 '13 at 12:34
I need be fast... i`d just wanna know which way can i go. read all file looking for it. implements a bufferereader to work in reverse... or something... If were you which way would go? — LeandroC, Sep 12 '13 at 12:46
Derp. Turns out there's [`RandomAccessFile.readLine()`](http://docs.oracle.com/javase/7/docs/api/java/io/RandomAccessFile.html#readLine()). So: **1.** seek somewhere before the end of the file. **2.** read a partial line and discard it. Remember the current file pointer. **3.** read lines until the end of the file. If you find what you're looking for, yay! **4.** If you don't, seek somewhere further back, and repeat from point 2., stopping at the previous remembered position. If you reach the beginning of the file, bail. — millimoose, Sep 12 '13 at 12:50
That said, this will give you the **first** occurence of what you're looking for in a given "chunk". You can fix this by always reading to the end of the current chunk and returning the last match. The basic idea is you don't read the file in reverse line-by-line, but chunk-by-chunk, then read lines in each chunk in forward. So this doesn't guarantee you'll only read as little of the file as possible to find the matching line, but you will avoid reading most of the bits you don't need. (If you choose chunk size wisely, preferrably so you start shortly before the line you need.) — millimoose, Sep 12 '13 at 12:55
@LeandroC (Sorry for leaving this as comments, but honestly there's no way I'm testing the actual code for that.) — millimoose, Sep 12 '13 at 13:00

score -1 · Answer 1 · answered Sep 12 '13 at 12:23

-1

Try this out, I think this is the fastest way to read a file in reverse order.

import java.io.*;  
import java.util.*;  
class FileReaderReverse  
{  
    public static void main(String[] args) throws IOException,FileNotFoundException  
    {  
        FileReader fr=new FileReader("abc.txt");  
        BufferedReader br=new BufferedReader(fr);  
        String s;  

        List<String> tmp = new ArrayList<String>();  
        do
        {  
            s = br.readLine();  
            tmp.add(s);  
        }while(s!=null);  


        for(int i=tmp.size()-1;i>=0;i--) {  
            System.out.println(tmp.get(i));  
        }  
    }  
}

answered Sep 12 '13 at 12:23

Siddh

712
4
21

1

-1: [Schlemiel the Painter](http://www.joelonsoftware.com/articles/fog0000000319.html) isn't very fast. – millimoose Sep 12 '13 at 12:27
@millimoose what is the fastest way to read a file in reverse order? – Siddh Sep 12 '13 at 12:30
For starters not buffering the whole thing in memory would help. – millimoose Sep 12 '13 at 12:37
@millimoose: for idiots, I/O buffering = speed. – Val Sep 12 '13 at 12:46
@Val Allocating a list of strings holding half a gigabyte of text when you only ever need to look at one line != speed. (That's what I meant by buffering here, not the use of buffered IO.) Reading 450 megabytes of text you never need != speed. I went over this in my comment on the question too - the file size and how far from the back the needed line is determines whether reading the whole file buffered wins over reading part of it unbuffered. Also the OP doesn't really need to go over the lines in reverse order as much as he needs to find the last line that matches some predicate efficiently. – millimoose Sep 12 '13 at 12:54
@millimoose without buffering you can not go to last line of file. – Siddh Sep 12 '13 at 12:59
@millimoose You can read using RandomAccessFile - position the file using randomAccesFile.length()and write using BufferedWriter – Siddh Sep 12 '13 at 13:00
@Siddh So why does your code not use that? – millimoose Sep 12 '13 at 13:17
Where is the requirement to read the halfGB files? Which gigabyte are you talking about? Why not 10 bytes or exabytes? Why 1/2 GB? All approaches have some limitations. Proposed one is fastest for little as well as gigabyte files (I have 8 GB in my PC and can even load it with 450 MB). You trade memory for speed, why do you loose the speed? – Val Sep 12 '13 at 13:21
@millimoose Why do you demand the RandomAccessFile from him if it is slower? Why dont you elaborate your Schlemiel the Painter argument here? Schlemiel is not related with buffering the whole file. Siddh proposal has nothing to do with the Schlemiel shit that you attribute to him. You just do not understand the articles the you recommend to others. – Val Sep 12 '13 at 13:28
@Val You do not trade memory for speed if you are, in fact, using up way too much memory. You're also not trading memory for speed if you are, in fact, doing excessive I/O to fill that much memory. That's not how the memory/tradeoff works. (It would be the case if you were caching the data you read for multiple uses, thus avoiding having to re-read it.) And while all approaches have limitations, my hunch is this one will be "fastest" only in very specific circumstances. (When the file is small, and the line you want not too close to the end.) – millimoose Sep 12 '13 at 13:29
@Val The speed is lost in this approach when reading the completely unneeded start of the file - the OP has mentioned they only need to find a line towards the end. (That's what my reference to Schlemiel the Painter means - doing work that is completely and obviously unnecessary.) At some point the overhead you save doing buffered I/O will not outweigh reading transferring unnecessary data. Also, I did propose my approach in the question comments, it's just nontrivial and I don't have time to code it up, much less test it enough to give it as an answer. – millimoose Sep 12 '13 at 13:30
@millimoose Approach works always when you need to read the whole file. Since I/O time prevails any other memory access, there is no difference in response time whether you read a short or long file as long as it fits in memory. Your fallacy is that you think that eating applie in peaces in faster than eating in awhole. – Val Sep 12 '13 at 13:34
@Val ...Are you honestly saying that it takes **the same time** to read a 1MB file from disk to memory as it takes to read a 1GB file? And what I'm saying is that eating 1/8th of an apple is faster than eating it whole. – millimoose Sep 12 '13 at 13:35
@Val Look. What the OP is trying to do, is, say, find which out of several files contains a line with `yyy` in it. And he knows this line will be towards the end of the file. So, given the example files in [this Gist](https://gist.github.com/millimoose/6537483): where you lose speed, when processing `file2.txt`, is reading the lines between `aaa` and `xxx` that you never ever need to look at. (Obviously those are dummy files, imagine that the actual input files are too big to be read into memory with one block read.) – millimoose Sep 12 '13 at 13:40
@Val The approach I proposed would do the following when looking for e.g. `sss`. Seek to `xxx`, read to the end, find nothing. Seek to `uuu`, read until you reach `xxx`, find nothing. Seek to `rrr`, read until you finally find `sss`, return. As you see, this would only involve transferring a quarter of `file2.txt` from the disk. – millimoose Sep 12 '13 at 13:46
Check this out it reads a file in reverse order using RandomAccessFile. http://stackoverflow.com/questions/15612610/does-randomaccessfile-in-java-read-entire-file-in-memory – Siddh Sep 12 '13 at 13:55
@millimoose You cannot read 1/8 of apple when you ultimately read it awhole. The proposed approach is fastest for this case, not only for small files. You cannot say that Siddih approach is not fastest if it is fastest for at least one use case (and pretty real-world, thereby) – Val Sep 12 '13 at 14:01
@Siddh I checked the implementation of `RandomAccessFile` and it seems it does not do buffering when using `readLine()`, so it seems that reading it that way would likely be a little faster. Of course faster yet might be combining a `BufferedReader` with a `RandomAccessFile` somehow but I honestly have no idea whether those would be in sync. – millimoose Sep 12 '13 at 14:23

Read a file in reverse order line by line without much overhead

1 Answers1