I need to read a text file line by line, and apply to each of them several CharsetDecoders, in order. Actually, I first try to decode line as if it's an UTF8-encoded one, and fallback to one-byte charset if UTF8 CharsetDecoder raises MalformedInputException.
However, if I use InputStreamReader with default or specified charset, readLine function silently replaces with '?' all the bytes it thinks are invalid for the specified charset.
I finally ended up writing my own function for reading lines, that reads from a stream byte by byte, seeks for line terminators and constructs lines. But this way it appears terribly slow.
Is there any way to make Java to read lines without touching bytes?
UPDATE:
I've found out that there are charsets in which all 256 bytes are valid, two of them line terminators.
So it is possible to read raw byte stream line by line
.
Examples of such charsets are:
IBM00858 IBM437 IBM775 IBM850 IBM852 IBM855 IBM860 IBM861 IBM862 IBM863 IBM865 IBM866 ISO-8859-1 ISO-8859-13 ISO-8859-15 ISO-8859-2 ISO-8859-4 ISO-8859-5 ISO-8859-9 KOI8-R KOI8-U windows-1256
The question is now closed.