41

I want to read the last n lines of a very big file without reading the whole file into any buffer/memory area using Java.

I looked around the JDK APIs and Apache Commons I/O and am not able to locate one which is suitable for this purpose.

I was thinking of the way tail or less does it in UNIX. I don't think they load the entire file and then show the last few lines of the file. There should be similar way to do the same in Java too.

Ciro Santilli OurBigBook.com
  • 347,512
  • 102
  • 1,199
  • 985
Gaurav Verma
  • 645
  • 1
  • 6
  • 15

15 Answers15

34

I found it the simplest way to do by using ReversedLinesFileReader from apache commons-io api. This method will give you the line from bottom to top of a file and you can specify n_lines value to specify the number of line.

import org.apache.commons.io.input.ReversedLinesFileReader;


File file = new File("D:\\file_name.xml");
int n_lines = 10;
int counter = 0; 
ReversedLinesFileReader object = new ReversedLinesFileReader(file);
while(counter < n_lines) {
    System.out.println(object.readLine());
    counter++;
}
Mise
  • 3,267
  • 1
  • 22
  • 22
akki_java
  • 605
  • 7
  • 17
  • 6
    Caution: Every time you call `readLine()`, the cursor advances. So this code would actually miss every other line because the output from `readLine()` in the `while` statement is not being captured. – aapierce Dec 23 '15 at 22:42
  • 2
    This code is bit faulty because readLine() is called twice. as mentioned by aapierce. But full points to ReversedLinesFileReader – vinksharma May 23 '17 at 21:11
  • 4
    @aapierce The comments from you and vinksharma are outdated, right? The editing from Mise solved the problem I guess.. It's a little bit confusing when the comments doesn't comply to the current version of the post itself. – Daniel Eisenreich Nov 06 '18 at 08:36
  • @DanielEisenreich Yeah, it looks like the answer was edited since I added my comment 3 years ago. It's not obvious to me how to edit my comment now. Sorry! – aapierce Nov 06 '18 at 15:20
30

If you use a RandomAccessFile, you can use length and seek to get to a specific point near the end of the file and then read forward from there.

If you find there weren't enough lines, back up from that point and try again. Once you've figured out where the Nth last line begins, you can seek to there and just read-and-print.

An initial best-guess assumption can be made based on your data properties. For example, if it's a text file, it's possible the line lengths won't exceed an average of 132 so, to get the last five lines, start 660 characters before the end. Then, if you were wrong, try again at 1320 (you can even use what you learned from the last 660 characters to adjust that - example: if those 660 characters were just three lines, the next try could be 660 / 3 * 5, plus maybe a bit extra just in case).

paxdiablo
  • 854,327
  • 234
  • 1,573
  • 1,953
22

RandomAccessFile is a good place to start, as described by the other answers. There is one important caveat though.

If your file is not encoded with an one-byte-per-character encoding, the readLine() method is not going to work for you. And readUTF() won't work in any circumstances. (It reads a string preceded by a character count ...)

Instead, you will need to make sure that you look for end-of-line markers in a way that respects the encoding's character boundaries. For fixed length encodings (e.g. flavors of UTF-16 or UTF-32) you need to extract characters starting from byte positions that are divisible by the character size in bytes. For variable length encodings (e.g. UTF-8), you need to search for a byte that must be the first byte of a character.

In the case of UTF-8, the first byte of a character will be 0xxxxxxx or 110xxxxx or 1110xxxx or 11110xxx. Anything else is either a second / third byte, or an illegal UTF-8 sequence. See The Unicode Standard, Version 5.2, Chapter 3.9, Table 3-7. This means, as the comment discussion points out, that any 0x0A and 0x0D bytes in a properly encoded UTF-8 stream will represent a LF or CR character. Thus, simply counting the 0x0A and 0x0D bytes is a valid implementation strategy (for UTF-8) if we can assume that the other kinds of Unicode line separator (0x2028, 0x2029 and 0x0085) are not used. You can't assume that, then the code would be more complicated.

Having identified a proper character boundary, you can then just call new String(...) passing the byte array, offset, count and encoding, and then repeatedly call String.lastIndexOf(...) to count end-of-lines.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • 1
    +1 for mentioning the caveat. I think that for UTF-8 the problem may be made simpler by scanning for '\n'... At least that's what Jon Skeet seems to imply in his answer to a [related question](http://stackoverflow.com/questions/686231/quickly-read-the-last-line-of-a-text-file)... Seems '\n' can only occur as a valid character in UTF-8 and never in the 'extra bytes'... – Stijn de Witt Aug 07 '14 at 21:53
  • Yes, for UTF-8 it's simple. UTF-8 encodes characters either as a single byte (all ASCII characters) or as multiple bytes (all other Unicode characters). Fortunately for us, newline is an ASCII character and in UTF-8, no multi-byte character contains bytes that are also valid ASCII characters. That is to say, if you scan an array of bytes for ASCII newline and you find it, you *know* it's a newline and not part of some other multi-byte character. I wrote a [blog post](http://stijndewitt.wordpress.com/2014/08/09/max-bytes-in-a-utf-8-char/) that has a nice table illustrating this. – Stijn de Witt Aug 10 '14 at 12:29
  • The problem is 1) character encodings where the byte `0x0a` is not a newline (e.g. UTF-16), and 2) the fact that there are other Unicode line separator codepoints; e.g. `0x2028`, `0x2029` and `0x0085` – Stephen C Aug 10 '14 at 12:46
  • Yes, the simple scenario only holds for UTF-8 and when newlines are encoded as either CRLF or just LF... However I think in practice this covers most real-world scenario's. UTF-16 is pretty rare when it comes to text file encoding (it is often used in-memory, but not very often in files) and I don't know many editors that will insert those other Unicode line separators... – Stijn de Witt Aug 10 '14 at 14:11
8

The ReversedLinesFileReader can be found in the Apache Commons IO java library.

    int n_lines = 1000;
    ReversedLinesFileReader object = new ReversedLinesFileReader(new File(path));
    String result="";
    for(int i=0;i<n_lines;i++){
        String line=object.readLine();
        if(line==null)
            break;
        result+=line;
    }
    return result;
Wisienkas
  • 1,602
  • 2
  • 17
  • 22
Torsten Simon
  • 348
  • 4
  • 10
4

I found RandomAccessFile and other Buffer Reader classes too slow for me. Nothing can be faster than a tail -<#lines>. So this it was the best solution for me.

public String getLastNLogLines(File file, int nLines) {
    StringBuilder s = new StringBuilder();
    try {
        Process p = Runtime.getRuntime().exec("tail -"+nLines+" "+file);
        java.io.BufferedReader input = new java.io.BufferedReader(new java.io.InputStreamReader(p.getInputStream()));
        String line = null;
    //Here we first read the next line into the variable
    //line and then check for the EOF condition, which
    //is the return value of null
    while((line = input.readLine()) != null){
            s.append(line+'\n');
        }
    } catch (java.io.IOException e) {
        e.printStackTrace();
    }
    return s.toString();
}
Luca
  • 331
  • 2
  • 7
  • 6
    Exec'ing out to `tail` can be a very expensive proposition in itself depending on how much memory you have. And it is also Unix specific. – Gray Nov 04 '13 at 20:12
  • Not a generic solution. Similar to tail there could be multiple utilities that can be used. This is not what asked in question. – shaILU Sep 15 '20 at 19:10
2

CircularFifoBuffer from apache commons . answer from a similar question at How to read last 5 lines of a .txt file into java

Note that in Apache Commons Collections 4 this class seems to have been renamed to CircularFifoQueue

Community
  • 1
  • 1
ruth542
  • 71
  • 2
  • 10
  • I checked out the class you mentioned, and though it can indeed be used to keep track of the last 5 lines in a file, I think the challenge here is not to keep track of the lines, but to find the point in the file where to start reading, and how to get to that point. – Stijn de Witt Aug 07 '14 at 19:30
2
package com.uday;

import java.io.File;
import java.io.RandomAccessFile;

public class TailN {
    public static void main(String[] args) throws Exception {
        long startTime = System.currentTimeMillis();

        TailN tailN = new TailN();
        File file = new File("/Users/udakkuma/Documents/workspace/uday_cancel_feature/TestOOPS/src/file.txt");
        tailN.readFromLast(file);

        System.out.println("Execution Time : " + (System.currentTimeMillis() - startTime));

    }

    public void readFromLast(File file) throws Exception {
        int lines = 3;
        int readLines = 0;
        StringBuilder builder = new StringBuilder();
        try (RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r")) {
            long fileLength = file.length() - 1;
            // Set the pointer at the last of the file
            randomAccessFile.seek(fileLength);

            for (long pointer = fileLength; pointer >= 0; pointer--) {
                randomAccessFile.seek(pointer);
                char c;
                // read from the last, one char at the time
                c = (char) randomAccessFile.read();
                // break when end of the line
                if (c == '\n') {
                    readLines++;
                    if (readLines == lines)
                        break;
                }
                builder.append(c);
                fileLength = fileLength - pointer;
            }
            // Since line is read from the last so it is in reverse order. Use reverse
            // method to make it correct order
            builder.reverse();
            System.out.println(builder.toString());
        }

    }
}
Uday Kumar
  • 126
  • 12
  • I like this approach but there are problems. The most important: you can't assume what `randomAccessFile.read();` returns is castable as a valid char. e.g. in UTF-8 encoding, the Euro symbol would be encoded in three bytes as `0xe2, 0x82, 0xac`. This means you need to read bytes, reverse them and *then* encode them. I'll try to post a reworking below – g00se May 03 '22 at 12:18
2

Here's one without an Apache dependency, and the results I got when reading the last 90,000 lines from a file with 100,000 lines:

This method: 50ms
Apache's ReversedLinesFileReader: 900ms
RandomAccessFile (reading in reverse): 1,200ms

Original source

public static String[] getLastNLinesFromFile(String filePath, int numLines) throws IOException {
    try (Stream<String> stream = Files.lines(Paths.get(filePath))) {
        AtomicInteger offset = new AtomicInteger();
        String[] lines = new String[numLines];
        stream.forEach(line -> {
            lines[offset.getAndIncrement() % numLines] = line;
        });
        List<String> list = IntStream.range(offset.get() < numLines ? 0 : offset.get() - numLines, offset.get())
                .mapToObj(idx -> lines[idx % numLines]).collect(Collectors.toList());
        return list.toArray(new String[0]);
    }
}
Adrian Bartyczak
  • 188
  • 3
  • 14
  • This actually looks like the best answer to me. It's simple and doesn't rely on third party dependencies nor do you have to deal with `RandomAccessFile`. – dzim Jul 07 '23 at 10:54
  • On a second thought it looks like you still go through all lines from the top to the bottom... Is that really a good idea, when you only need the last lines? – dzim Jul 07 '23 at 11:01
  • 1
    This is a good point. And it made me curious. So I tested all the RandomAccessFile solutions that read the file in reverse, and found that for a 100,000 line file, when reading the last 90,000 lines, they take about 1,200ms. When I test out my solution, it takes about 50ms. So I would say this is better than RandomAccessFile, but if you really want to read the file in reverse, your best bet would be to use Apache's ReversedLinesFileReader. – Adrian Bartyczak Jul 07 '23 at 15:58
  • I just realized that the answer said it's "20ms slower than Apache's ReversedLinesFileReader" and "50ms faster than RandomAccessFile", however, after testing again, I got that it's about 850ms faster than Apache's ReversedLinesFileReader and 1,150ms faster than RandomAccessFile. Don't know how my results were so off before. – Adrian Bartyczak Jul 07 '23 at 23:22
  • Taken on a different computer, maybe? You could also use parallel stream. Maybe that could speed up things a bit. My mahor concern would be memory efficiency: You are constantly copying stuff. While the GC might have no problems with it, it seams overkill to basically store all the lines at least once at some point to only keep maybe a small percentage pf them in the end. – dzim Jul 10 '23 at 05:26
1

A RandomAccessFile allows for seeking (http://download.oracle.com/javase/1.4.2/docs/api/java/io/RandomAccessFile.html). The File.length method will return the size of the file. The problem is determining number of lines. For this, you can seek to the end of the file and read backwards until you have hit the right number of lines.

Yann Ramin
  • 32,895
  • 3
  • 59
  • 82
1

I had similar problem, but I don't understood to another solutions.

I used this. I hope thats simple code.

// String filePathName = (direction and file name).
File f = new File(filePathName);
long fileLength = f.length(); // Take size of file [bites].
long fileLength_toRead = 0;
if (fileLength > 2000) {
    // My file content is a table, I know one row has about e.g. 100 bites / characters. 
    // I used 1000 bites before file end to point where start read.
    // If you don't know line length, use @paxdiablo advice.
    fileLength_toRead = fileLength - 1000;
}
try (RandomAccessFile raf = new RandomAccessFile(filePathName, "r")) { // This row manage open and close file.
    raf.seek(fileLength_toRead); // File will begin read at this bite. 
    String rowInFile = raf.readLine(); // First readed line usualy is not whole, I needn't it.
    rowInFile = raf.readLine();
    while (rowInFile != null) {
        // Here I can readed lines (rowInFile) add to String[] array or ArriyList<String>.
        // Later I can work with rows from array - last row is sometimes empty, etc.
        rowInFile = raf.readLine();
    }
}
catch (IOException e) {
    //
}
pocket
  • 23
  • 5
1

Here is the working for this.

    private static void printLastNLines(String filePath, int n) {
    File file = new File(filePath);
    StringBuilder builder = new StringBuilder();
    try {
        RandomAccessFile randomAccessFile = new RandomAccessFile(filePath, "r");
        long pos = file.length() - 1;
        randomAccessFile.seek(pos);

        for (long i = pos - 1; i >= 0; i--) {
            randomAccessFile.seek(i);
            char c = (char) randomAccessFile.read();
            if (c == '\n') {
                n--;
                if (n == 0) {
                    break;
                }
            }
            builder.append(c);
        }
        builder.reverse();
        System.out.println(builder.toString());
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
}
user11016
  • 171
  • 7
0

(See commend)

public String readFromLast(File file, int howMany) throws IOException {
    int numLinesRead = 0;
    StringBuilder builder = new StringBuilder();
    try (RandomAccessFile randomAccessFile = new RandomAccessFile(file, "r")) {
        try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) {
            long fileLength = file.length() - 1;
            /*
             * Set the pointer at the end of the file. If the file is empty, an IOException
             * will be thrown
             */
            randomAccessFile.seek(fileLength);

            for (long pointer = fileLength; pointer >= 0; pointer--) {
                randomAccessFile.seek(pointer);
                byte b = (byte) randomAccessFile.read();
                if (b == '\n') {
                    numLinesRead++;
                    // (Last line often terminated with a line separator)
                    if (numLinesRead == (howMany + 1))
                        break;
                }
                baos.write(b);
                fileLength = fileLength - pointer;
            }
            /*
             * Since line is read from the last so it is in reverse order. Use reverse
             * method to make it ordered correctly
             */
            byte[] a = baos.toByteArray();
            int start = 0;
            int mid = a.length / 2;
            int end = a.length - 1;

            while (start < mid) {
                byte temp = a[end];
                a[end] = a[start];
                a[start] = temp;
                start++;
                end--;
            }// End while
            return new String(a).trim();
        } // End inner try-with-resources
    } // End outer try-with-resources

} // End method
g00se
  • 3,207
  • 2
  • 5
  • 9
0

Here is the best way I've found to do it. Simple and pretty fast and memory efficient.

public static void tail(File src, OutputStream out, int maxLines) throws FileNotFoundException, IOException {
    BufferedReader reader = new BufferedReader(new FileReader(src));
    String[] lines = new String[maxLines];
    int lastNdx = 0;
    for (String line=reader.readLine(); line != null; line=reader.readLine()) {
        if (lastNdx == lines.length) {
            lastNdx = 0;
        }
        lines[lastNdx++] = line;
    }

    OutputStreamWriter writer = new OutputStreamWriter(out);
    for (int ndx=lastNdx; ndx != lastNdx-1; ndx++) {
        if (ndx == lines.length) {
            ndx = 0;
        }
        writer.write(lines[ndx]);
        writer.write("\n");
    }

    writer.flush();
}
ra9r
  • 4,528
  • 4
  • 42
  • 52
0

I tried RandomAccessFile first and it was tedious to read the file backwards, repositioning the file pointer upon every read operation. So, I tried @Luca solution and I got the last few lines of the file as a string in just two lines in a few minutes.

    InputStream inputStream = Runtime.getRuntime().exec("tail " + path.toFile()).getInputStream();
    String tail = new BufferedReader(new InputStreamReader(inputStream)).lines().collect(Collectors.joining(System.lineSeparator()));
0

Code is 2 lines only

     // Please specify correct Charset
     ReversedLinesFileReader rlf = new ReversedLinesFileReader(file, StandardCharsets.UTF_8);

     // read last 2 lines
     System.out.println(rlf.toString(2));

Gradle:

implementation group: 'commons-io', name: 'commons-io', version: '2.11.0'

Maven:

   <dependency>
        <groupId>commons-io</groupId><artifactId>commons-io</artifactId><version>2.11.0</version>
   </dependency>
grep
  • 5,465
  • 12
  • 60
  • 112