I made some tests using a 21 GB file filled with random strings, each line had a length of 20-40 characters.
It seems like the builtin BufferedReader is still the fastest method.
File f = new File("sfs");
try(Stream<String> lines = Files.lines(f.toPath(), StandardCharsets.UTF_8)){
lines.forEach(line -> System.out.println(line));
} catch (IOException e) {}
Reading the lines to a stream ensures you read the lines as you need them instead of reading the entire file at once.
To improve speed even further you can increase the buffer size of the BufferedReader by a moderate factor. In my tests it starter to outperform the normal buffer size at about 10 millions lines.
CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
int size = 8192 * 16;
try (BufferedReader br = new BufferedReader(new InputStreamReader(newInputStream(f.toPath()), decoder), size)) {
br.lines().limit(LINES_TO_READ).forEach(s -> {
});
} catch (IOException e) {
e.printStackTrace();
}
The code I used for testing:
private static long LINES_TO_READ = 10_000_000;
private static void java8Stream(File f) {
long startTime = System.nanoTime();
try (Stream<String> lines = Files.lines(f.toPath(), StandardCharsets.UTF_8).limit(LINES_TO_READ)) {
lines.forEach(line -> {
});
} catch (IOException e) {
e.printStackTrace();
}
long endTime = System.nanoTime();
System.out.println("no buffer took " + (endTime - startTime) + " nanoseconds");
}
private static void streamWithLargeBuffer(File f) {
long startTime = System.nanoTime();
CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
int size = 8192 * 16;
try (BufferedReader br = new BufferedReader(new InputStreamReader(newInputStream(f.toPath()), decoder), size)) {
br.lines().limit(LINES_TO_READ).forEach(s -> {
});
} catch (IOException e) {
e.printStackTrace();
}
long endTime = System.nanoTime();
System.out.println("using large buffer took " + (endTime - startTime) + " nanoseconds");
}
private static void memoryMappedFile(File f) {
CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();
long linesReadCount = 0;
String line = "";
long startTime = System.nanoTime();
try (RandomAccessFile file2 = new RandomAccessFile(f, "r")) {
FileChannel fileChannel = file2.getChannel();
MappedByteBuffer buffer = fileChannel.map(FileChannel.MapMode.READ_ONLY, 0L, Integer.MAX_VALUE - 10_000_000);
CharBuffer decodedBuffer = decoder.decode(buffer);
for (int i = 0; i < decodedBuffer.limit(); i++) {
char a = decodedBuffer.get();
if (a == '\n') {
line = "";
} else {
line += Character.toString(a);
}
if (linesReadCount++ >= LINES_TO_READ){
break;
}
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
long endTime = System.nanoTime();
System.out.println("using memory mapped files took " + (endTime - startTime) + " nanoseconds");
}
Btw I noticed that FileChannel.map throws an exception if the mapped file is larger than Integer.MAX_VALUE which makes the method impractical for reading very large files.