0

How can I efficiently determine the possition of the last newline from a specific part from a file?

e.g. I tried this

BufferedReader br = new BufferedReader(new FileReader(file));
long length = file.length();
String line = null;
int tailLength = 0;
while ((line = br.readLine()) != null) {
    System.out.println(line);
    tailLength = line.getBytes().length;
}
int returnValue = length - tailLength;

but this will only return the possition of the very last newline in the whole file, and not the last newline in a section of the file. This section would be indicated by an int start; and an int end;

MrLang
  • 629
  • 1
  • 6
  • 15
  • Where in your code are you searching for `int start;` or `int end;`? – Sean Bright Mar 15 '16 at 14:25
  • What is the `length` integer in your code ? – Arnaud Mar 15 '16 at 14:25
  • What's a section in your case? – alamar Mar 15 '16 at 14:33
  • `length` is the lentgth of the whole file. And i didn't not introduce `start` or `end` yet, because i didn't know how to use them yet. But they are supposed to define the section i am interested in. – MrLang Mar 15 '16 at 14:47
  • I would try to read the file in bytes: http://stackoverflow.com/questions/858980/file-to-byte-in-java. Then search it backwards. Remember to use System.lineSeparator() for platform independence. – Joe Mar 15 '16 at 14:56

2 Answers2

1

Unfortunately you can't, I had to use RandomAccessFile which has getFilePointer() method which you can call after readLine(), but it is VERY SLOW and not UTF-8-aware.

I ended up implementing my own byte counting line reader.

Your naive solution will fail horribly when facing files with unicode, malformed or binary contents.

alamar
  • 18,729
  • 4
  • 64
  • 97
1

I think the most efficient approach is to start from the end of the file and read it in chunks. then, search it backwards for the first line.

i.e.

import java.io.IOException;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.channels.FileLock;
import java.nio.file.Path;
import java.nio.file.StandardOpenOption;

public class FileUtils {

    static final int CHUNK_SIZE = 8 * 1024;

    public static long getLastLinePosition(Path path) throws IOException {
        try (FileChannel inChannel = FileChannel.open(path, StandardOpenOption.READ);
             @SuppressWarnings("unused")
             FileLock lock = inChannel.tryLock(0, Long.MAX_VALUE, true)) {
            long fileSize = inChannel.size();
            long mark = fileSize;
            long position;
            boolean ignoreCR = false;
            while (mark > 0) {
                position = Math.max(0, mark - CHUNK_SIZE);

                MappedByteBuffer mbb = inChannel.map(FileChannel.MapMode.READ_ONLY, position, Math.min(mark, CHUNK_SIZE));
                byte[] bytes = new byte[mbb.remaining()];
                mbb.get(bytes);

                for (int i = bytes.length - 1; i >= 0; i--, mark--) {
                    switch (bytes[i]) {
                        case '\n':
                            if (mark < fileSize) {
                                return mark;
                            }
                            ignoreCR = true;
                            break;
                        case '\r':
                            if (ignoreCR) {
                                ignoreCR = false;
                            } else if (mark < fileSize) {
                                return mark;
                            }
                            break;
                    }
                }

                mark = position;
            }
        }
        return 0;
    }

}

test file :

abc\r\n
1234\r\n
def\r\n

output : 11

learn more about java.nio.channels.FileChannel and java.nio.MappedByteBuffer :

EDIT :

if you are using Java 6, apply these changes to the above code :

import java.io.IOException;
import java.io.RandomAccessFile;
import java.nio.channels.FileChannel;
import java.nio.channels.FileLock;

public class FileUtils {

    static final int CHUNK_SIZE = 8 * 1024;

    public static long getLastLinePosition(String name) throws IOException {
        FileChannel inChannel = null;
        FileLock lock = null;
        try {
            inChannel = new RandomAccessFile(name, "r").getChannel();
            lock = inChannel.tryLock(0, Long.MAX_VALUE, true);

            // ...

        } finally {
            if (lock != null) {
                lock.release();
            }
            if (inChannel != null) {
                inChannel.close();
            }
        }
        return 0;
    }

}

Tips on choosing ideal buffer size :

Community
  • 1
  • 1
FaNaJ
  • 1,329
  • 1
  • 16
  • 39
  • thanks, it looks very good. I use only Java 6 so I had to adapt a few things. But it seams to work well. But I don't realy know what `CHUNK_SIZE` does, and why is it `8 * 1024`? – MrLang Mar 17 '16 at 08:33