first of all: I don't have a deep knowledge of the java 8 streams and may be what I'm going to ask is trivial, impossible or already implemented.
I'm working with records stored in large binary files. Those binary files are associated with another binary index allowing to access some parts of the file using RandomAccessFile .
The interface would be:
public interface BinaryFile<T> extends Iterable<T> {
public CloseableIterator<T> queryUsingIndex(long beginIndex,long endIndex);
public CloseableIterator<T> iterator();/* get all */
}
Say, I want to count the number of records using java 8 streams. As far as I understand, I could use a stream to count the number of records in the binary file. A parallel stream would run things faster by counting the number of records in each colored part.
new BinaryFileImp(myFile).parallel().count();
Is it possible to implement this kind of Iterator using a random-access file ? where should I start ? Which classes in the JDK I should consider ?
Thank you for your suggestions.
EDIT: additional information. I'm working with SAM files ( https://samtools.github.io/hts-specs/SAMv1.pdf ), a common bioinformatics file format storing millions of records along the genome. A common practice is to work in parallel on different chromosome to speed up things. So, to count the number of records , i would sum the count on { chromosome 1, chromosome 2, ... chromosome Y }