UTF-8 is a variable length encoding, where each character is 1 to 6 bytes. You can't simply compare the first byte of the file to the last byte. Depending on the encoded length of the first character, you might need to compare the first byte with the sixth-to-last byte.
You can get relatively efficient random file access with RandomAccessFile
or FileChannel
, but the API (or the underlying file system) wasn't designed for reading "backward". To read backward, every read()
would have to be preceded by a seek()
.
At some level, an entire block is read from the file system and held in memory, so actual seeking and reading of a physical hard drive head is minimized. The overhead involved in making billions of these calls from Java down to the operating system stacks up though, so it might be worthwhile to maintain your own buffer. A seek and a bulk read is performed only when the buffer is empty.
Luckily, your teacher didn't ask for Unicode support as well!