It depends on how you want to process the data, which method of delivery is appropriate. So if your processing requires processing the data line by line, there is no way around doing it that way.
If you really want fixed size chunks of character data, you can using the following method(s):
public static Stream<String> chunks(Path path, int chunkSize) throws IOException {
return chunks(path, chunkSize, StandardCharsets.UTF_8);
}
public static Stream<String> chunks(Path path, int chunkSize, Charset cs)
throws IOException {
Objects.requireNonNull(path);
Objects.requireNonNull(cs);
if(chunkSize<=0) throw new IllegalArgumentException();
CharBuffer cb = CharBuffer.allocate(chunkSize);
BufferedReader r = Files.newBufferedReader(path, cs);
return StreamSupport.stream(
new Spliterators.AbstractSpliterator<String>(
Files.size(path)/chunkSize, Spliterator.ORDERED|Spliterator.NONNULL) {
@Override public boolean tryAdvance(Consumer<? super String> action) {
try { do {} while(cb.hasRemaining() && r.read(cb)>0); }
catch (IOException ex) { throw new UncheckedIOException(ex); }
if(cb.position()==0) return false;
action.accept(cb.flip().toString());
return true;
}
}, false).onClose(() -> {
try { r.close(); } catch(IOException ex) { throw new UncheckedIOException(ex); }
});
}
but I wouldn’t be surprised if your next question is “how can I merge adjacent stream elements”, as these fixed sized chunks are rarely the natural data unit to your actual task.
More than often, the subsequent step is to perform pattern matching within the contents and in this case, it’s better to use Scanner
in the first place, which is capable of performing pattern matching while streaming the data, which can be done efficiently as the regex engine tells whether buffering more data could change the outcome of a match operation (see hitEnd()
and requireEnd()
). Unfortunately, generating a stream of matches from a Scanner
has only been added in Java 9, but see this answer for a back-port of that feature to Java 8.