Because the list is unordered there is no alternative to reading the entire file.
If you're lucky, the first name is the name you're looking for: o(1).
If you're unlucky, it's the last name: O(n).
Apart from this, it doesn't matter if you do it the java.io
way (Files.newBufferedReader()
) or the java.nio
way (Files.newByteChannel()
), they both - more or less - perform the same. If the input file is line based (as in your case), you may use
Files.lines().filter(l -> name.equals(l)).findFirst();
which internally uses a BufferedReader.
If you really wan't to speed up things, you have to sort the names in the file (see How do I sort very large files), now you're able to read from an
EDIT: ordered list using an index
Once you have an ordered list, you could fast-scan and create an index using a TreeMap
and then jump right to correct file position (use a RandomAccessFile
or SeekableByteChannel
) and read the name.
For example:
long blockSize = 1048576L;
Path file = Paths.get("yourFile");
long fileSize = Files.size(file);
RandomAccessFile raf = new RandomAccessFile(file.toFile(), "r");
//create the index
TreeMap<String, Long> index = new TreeMap<>();
for(long pos = 0; pos < fileSize; pos += blockSize) {
//jump the next block
raf.seek(pos);
index.put(raf.readLine(), pos);
}
//get the position of a name
String name = "someName";
//get the beginning and end of the block
long offset = Optional.ofNullable(index.lowerEntry(name)).map(Map.Entry::getValue).orElse(0L);
long limit = Optional.ofNullable(index.ceilingEntry(name)).map(Map.Entry::getValue).orElse(fileSize);
//move the pointer to the offset position
raf.seek(offset);
long cur;
while((cur = raf.getFilePointer()) < limit){
if(name.equals(raf.readLine())) {
return cur;
}
}
The block size is a tradeoff between index-size, index-creation time and data-access time. The larger the blocks, the smaller the index and index-creation time but the larger the data-access time.