68

What's the quickest and most efficient way of reading the last line of text from a [very, very large] file in Java?

Gray
  • 115,027
  • 24
  • 293
  • 354
Jake
  • 15,007
  • 22
  • 70
  • 86

10 Answers10

95

Below are two functions, one that returns the last non-blank line of a file without loading or stepping through the entire file, and the other that returns the last N lines of the file without stepping through the entire file:

What tail does is zoom straight to the last character of the file, then steps backward, character by character, recording what it sees until it finds a line break. Once it finds a line break, it breaks out of the loop. Reverses what was recorded and throws it into a string and returns. 0xA is the new line and 0xD is the carriage return.

If your line endings are \r\n or crlf or some other "double newline style newline", then you will have to specify n*2 lines to get the last n lines because it counts 2 lines for every line.

public String tail( File file ) {
    RandomAccessFile fileHandler = null;
    try {
        fileHandler = new RandomAccessFile( file, "r" );
        long fileLength = fileHandler.length() - 1;
        StringBuilder sb = new StringBuilder();

        for(long filePointer = fileLength; filePointer != -1; filePointer--){
            fileHandler.seek( filePointer );
            int readByte = fileHandler.readByte();

            if( readByte == 0xA ) {
                if( filePointer == fileLength ) {
                    continue;
                }
                break;
                
            } else if( readByte == 0xD ) {
                if( filePointer == fileLength - 1 ) {
                    continue;
                }
                break;
            }

            sb.append( ( char ) readByte );
        }

        String lastLine = sb.reverse().toString();
        return lastLine;
    } catch( java.io.FileNotFoundException e ) {
        e.printStackTrace();
        return null;
    } catch( java.io.IOException e ) {
        e.printStackTrace();
        return null;
    } finally {
        if (fileHandler != null )
            try {
                fileHandler.close();
            } catch (IOException e) {
                /* ignore */
            }
    }
}

But you probably don't want the last line, you want the last N lines, so use this instead:

public String tail2( File file, int lines) {
    java.io.RandomAccessFile fileHandler = null;
    try {
        fileHandler = 
            new java.io.RandomAccessFile( file, "r" );
        long fileLength = fileHandler.length() - 1;
        StringBuilder sb = new StringBuilder();
        int line = 0;

        for(long filePointer = fileLength; filePointer != -1; filePointer--){
            fileHandler.seek( filePointer );
            int readByte = fileHandler.readByte();

             if( readByte == 0xA ) {
                if (filePointer < fileLength) {
                    line = line + 1;
                }
            } else if( readByte == 0xD ) {
                if (filePointer < fileLength-1) {
                    line = line + 1;
                }
            }
            if (line >= lines) {
                break;
            }
            sb.append( ( char ) readByte );
        }

        String lastLine = sb.reverse().toString();
        return lastLine;
    } catch( java.io.FileNotFoundException e ) {
        e.printStackTrace();
        return null;
    } catch( java.io.IOException e ) {
        e.printStackTrace();
        return null;
    }
    finally {
        if (fileHandler != null )
            try {
                fileHandler.close();
            } catch (IOException e) {
            }
    }
}

Invoke the above methods like this:

File file = new File("D:\\stuff\\huge.log");
System.out.println(tail(file));
System.out.println(tail2(file, 10));

Warning In the wild west of unicode this code can cause the output of this function to come out wrong. For example "Mary?s" instead of "Mary's". Characters with hats, accents, Chinese characters etc may cause the output to be wrong because accents are added as modifiers after the character. Reversing compound characters changes the nature of the identity of the character on reversal. You will have to do full battery of tests on all languages you plan to use this with.

For more information about this unicode reversal problem read this: https://codeblog.jonskeet.uk/2009/11/02/omg-ponies-aka-humanity-epic-fail/

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
  • 3
    The above does not take into account lines terminated with both CR and LF. – Jags Mar 17 '14 at 03:44
  • 1
    your multiline implementation does not work in your special cases of filePointer == fileLength, line will stay the same, therefore the condition line == lines will not fire after that and the code will read the whole file. – ZPiDER Feb 16 '15 at 10:51
  • The warning about the wild west of Unicode is confession of a bug. The algorithm utilized is flawed, hence the bug hence the warning. In UTF8 there is a maximum length of multibyte encoding; that length is N=4B×max-length-of-combining-chars. For any given byte position in a file, that byte might be a noninitial byte/grapheme in a nonnormalized Unicode character. When the code above thinks that it has found a fact, it must deem it a suspicious fact until the prior N bytes are examined to determine whether the suspicious fact is in fact not a fact due to being within a UTF8 multibyte seq. – Andreas ZUERCHER Apr 07 '21 at 16:12
  • 1
    @AndreasZUERCHER I confess to the bug, as you say, Next I would like you to analyze, and write a program to solve the global timezone-problem, as described by computerphile here: https://www.youtube.com/watch?v=-5wpm-gesOY – Eric Leschinski Apr 07 '21 at 16:28
  • To avoid the Unicode problems related to reverting the string (or the StringBuilder), one can read to a byte list, from the end of the file, revert it to a byte array and then create the String from the byte array. – Helder Daniel Jun 03 '21 at 11:16
38

Apache Commons has an implementation using RandomAccessFile.

It's called ReversedLinesFileReader.

jaco0646
  • 15,303
  • 7
  • 59
  • 83
21

Have a look at my answer to a similar question for C#. The code would be quite similar, although the encoding support is somewhat different in Java.

Basically it's not a terribly easy thing to do in general. As MSalter points out, UTF-8 does make it easy to spot \r or \n as the UTF-8 representation of those characters is just the same as ASCII, and those bytes won't occur in multi-byte character.

So basically, take a buffer of (say) 2K, and progressively read backwards (skip to 2K before you were before, read the next 2K) checking for a line termination. Then skip to exactly the right place in the stream, create an InputStreamReader on the top, and a BufferedReader on top of that. Then just call BufferedReader.readLine().

Community
  • 1
  • 1
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 2
    UTF-8 doesn't matter - you need the last CR or LF character, which is a single byte in both ASCII and UTF-8. – MSalters Mar 26 '09 at 15:39
6

Using FileReader or FileInputStream won't work - you'll have to use either FileChannel or RandomAccessFile to loop through the file backwards from the end. Encodings will be a problem though, as Jon said.

Michael Borgwardt
  • 342,105
  • 78
  • 482
  • 720
4

You can easily change the below code to print the last line.

MemoryMappedFile for printing last 5 lines:

private static void printByMemoryMappedFile(File file) throws FileNotFoundException, IOException{
        FileInputStream fileInputStream=new FileInputStream(file);
        FileChannel channel=fileInputStream.getChannel();
        ByteBuffer buffer=channel.map(FileChannel.MapMode.READ_ONLY, 0, channel.size());
        buffer.position((int)channel.size());
        int count=0;
        StringBuilder builder=new StringBuilder();
        for(long i=channel.size()-1;i>=0;i--){
            char c=(char)buffer.get((int)i);
            builder.append(c);
            if(c=='\n'){
                if(count==5)break;
                count++;
                builder.reverse();
                System.out.println(builder.toString());
                builder=null;
                builder=new StringBuilder();
            }
        }
        channel.close();
    }

RandomAccessFile to print last 5 lines:

private static void printByRandomAcessFile(File file) throws FileNotFoundException, IOException{
        RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r");
        int lines = 0;
        StringBuilder builder = new StringBuilder();
        long length = file.length();
        length--;
        randomAccessFile.seek(length);
        for(long seek = length; seek >= 0; --seek){
            randomAccessFile.seek(seek);
            char c = (char)randomAccessFile.read();
            builder.append(c);
            if(c == '\n'){
                builder = builder.reverse();
                System.out.println(builder.toString());
                lines++;
                builder = null;
                builder = new StringBuilder();
                if (lines == 5){
                    break;
                }
            }

        }
    }
Trying
  • 14,004
  • 9
  • 70
  • 110
2

as far as I know The fastest way to read the last line of a text file is using FileUtils Apache class which is in "org.apache.commons.io". I have a two-million-line file and by using this class, it took me less than one second to find the last line. Here is the my code:

LineIterator lineIterator = FileUtils.lineIterator(newFile(filePath),"UTF-8");
String lastLine="";
while (lineIterator.hasNext()){
 lastLine=  lineIterator.nextLine();
}
arash nadali
  • 568
  • 4
  • 7
  • Same comment above by Lorenzo also applies here: This works, but is probably not the most efficient solution. – martin_wun Jul 21 '21 at 06:12
1
try(BufferedReader reader = new BufferedReader(new FileReader(reqFile))) {

    String line = null;

    System.out.println("======================================");

    line = reader.readLine();       //Read Line ONE
    line = reader.readLine();       //Read Line TWO
    System.out.println("first line : " + line);

    //Length of one line if lines are of even length
    int len = line.length();       

    //skip to the end - 3 lines
    reader.skip((reqFile.length() - (len*3)));

    //Searched to the last line for the date I was looking for.

    while((line = reader.readLine()) != null){

        System.out.println("FROM LINE : " + line);
        String date = line.substring(0,line.indexOf(","));

        System.out.println("DATE : " + date);      //BAM!!!!!!!!!!!!!!
    }

    System.out.println(reqFile.getName() + " Read(" + reqFile.length()/(1000) + "KB)");
    System.out.println("======================================");
} catch (IOException x) {
    x.printStackTrace();
}
Ahmed Ashour
  • 5,179
  • 10
  • 35
  • 56
Ajai Singh
  • 57
  • 2
0

To avoid the Unicode problems related to reverting the string (or the StringBuilder), as discussed in Eric Leschinski excellent answer, one can read to a byte list, from the end of the file, revert it to a byte array and then create the String from the byte array.

Below are the changes to Eric Leschinski answer's code, to do it with a byte array. The code changes are below the commented lines of code:

static public String tail2(File file, int lines) {
    java.io.RandomAccessFile fileHandler = null;
    try {
        fileHandler = new java.io.RandomAccessFile( file, "r" );
        long fileLength = fileHandler.length() - 1;
        //StringBuilder sb = new StringBuilder();
        List<Byte> sb = new ArrayList<>();
        int line = 0;

        for(long filePointer = fileLength; filePointer != -1; filePointer--){
            fileHandler.seek( filePointer );
            int readByte = fileHandler.readByte();

            if( readByte == 0xA ) {
                if (filePointer < fileLength) {
                    line = line + 1;
                }
            } else if( readByte == 0xD ) {
                if (filePointer < fileLength-1) {
                    line = line + 1;
                }
            }
            if (line >= lines) {
                break;
            }
            //sb.add( (char) readByte );
            sb.add( (byte) readByte );
        }

        //String lastLine = sb.reverse().toString();
        //Revert byte array and create String
        byte[] bytes = new byte[sb.size()];
        for (int i=0; i<sb.size(); i++) bytes[sb.size()-1-i] = sb.get(i);
        String lastLine = new String(bytes);
        return lastLine;
    } catch( java.io.FileNotFoundException e ) {
        e.printStackTrace();
        return null;
    } catch( java.io.IOException e ) {
        e.printStackTrace();
        return null;
    }
    finally {
        if (fileHandler != null )
            try {
                fileHandler.close();
            } catch (IOException e) {
            }
    }
}
Helder Daniel
  • 331
  • 3
  • 8
0

In C#, you should be able to set the stream's position:

From: http://bytes.com/groups/net-c/269090-streamreader-read-last-line-text-file

using(FileStream fs = File.OpenRead("c:\\file.dat"))
{
    using(StreamReader sr = new StreamReader(fs))
    {
        sr.BaseStream.Position = fs.Length - 4;
        if(sr.ReadToEnd() == "DONE")
            // match
    }
}
Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
rball
  • 6,925
  • 7
  • 49
  • 77
  • In Java's FileInputStream (which FileReader is based on), you cannot set the position; you can only skip forward, which probably does not read the parts you skip, but is still a one-way operation and thus not suited to looking for a linebreak at an unknown offset from the end. – Michael Borgwardt Mar 26 '09 at 15:32
  • You can use mark() to get around that problem, depending on what the streams markLimit() is. – James Schek Mar 26 '09 at 16:08
0

Code is 2 lines only

     // Please specify correct Charset
     ReversedLinesFileReader rlf = new ReversedLinesFileReader(file, StandardCharsets.UTF_8);

     // read last 2 lines
     System.out.println(rlf.toString(2));

Gradle:

implementation group: 'commons-io', name: 'commons-io', version: '2.11.0'

Maven:

   <dependency>
        <groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.11.0</version>
   </dependency>
grep
  • 5,465
  • 12
  • 60
  • 112