I started with an InputStreamReader, but this buffered its input, reading more than was required from the input stream (as mentioned in its Java docs). Delving into the source code (java version "1.7.0_147-icedtea") I got to the sun.nio.cs.StreamDecoder class, which contained the comment:
// In order to handle surrogates properly we must never try to produce
// fewer than two characters at a time. If we're only asked to return one
// character then the other is saved here to be returned later.
So I guess the question becomes "is this true, and if so why?" From my (very basic!) understanding of the 6 charsets required by the JLS, it is always possible to determine the exact number of bytes required to read a single character, so no read-ahead would be necessary.
Background is I had a binary file containing a bunch of data with different encodings (numbers, strings, single byte tokens etc.). The basic format was a repeating set of byte marker (indicating the type of data) followed by optional data if required for that type. The two types containing character data were null-terminated strings and strings with a preceding 2-byte length. So for null terminated strings I thought something like this would do the trick:
String readStringWithNull(InputStream in) throws IOException {
StringWriter sw = new StringWriter();
InputStreamReader isr = new InputStreamReader(in, "UTF-16LE");
for (int i; (i = isr.read()) > 0; ) {
sw.write(i);
}
return sw.toString();
}
But the InputStreamReader read ahead from the buffer, so subsequent read operations on the base InputStream missed data. For my particular case I knew that all characters would be UTF-16LE BMP (sort of UCS-2LE) so I just coded around that, but I'm still interested in the general case above.
Also, I've seen InputStreamReader buffering issue which is similar, but does not appear to answer this specific question.
Cheers,