RandomAccessFile is a good place to start, as described by the other answers. There is one important caveat though.
If your file is not encoded with an one-byte-per-character encoding, the readLine()
method is not going to work for you. And readUTF()
won't work in any circumstances. (It reads a string preceded by a character count ...)
Instead, you will need to make sure that you look for end-of-line markers in a way that respects the encoding's character boundaries. For fixed length encodings (e.g. flavors of UTF-16 or UTF-32) you need to extract characters starting from byte positions that are divisible by the character size in bytes. For variable length encodings (e.g. UTF-8), you need to search for a byte that must be the first byte of a character.
In the case of UTF-8, the first byte of a character will be 0xxxxxxx
or 110xxxxx
or 1110xxxx
or 11110xxx
. Anything else is either a second / third byte, or an illegal UTF-8 sequence. See The Unicode Standard, Version 5.2, Chapter 3.9, Table 3-7. This means, as the comment discussion points out, that any 0x0A and 0x0D bytes in a properly encoded UTF-8 stream will represent a LF or CR character. Thus, simply counting the 0x0A and 0x0D bytes is a valid implementation strategy (for UTF-8) if we can assume that the other kinds of Unicode line separator (0x2028, 0x2029 and 0x0085) are not used. You can't assume that, then the code would be more complicated.
Having identified a proper character boundary, you can then just call new String(...)
passing the byte array, offset, count and encoding, and then repeatedly call String.lastIndexOf(...)
to count end-of-lines.