7

I have some bytes which should be UTF-8 encoded, but which may contain a text is ISO8859-1 encoding, if the user somehow didn't manage to use his text editor the right way.

I read the file with an InputStreamReader:

InputStreamReader reader = new InputStreamReader( 
    new FileInputStream(file), Charset.forName("UTF-8"));

But every time the user uses umlauts like "ä", which are invalid UTF-8 when stored in ISO8859-1 the InputStreamReader does not complain but adds placeholder characters.

Is there is simple way to make this throw an Exception on invalid input?

Daniel
  • 27,718
  • 20
  • 89
  • 133

2 Answers2

7
CharsetDecoder decoder = Charset.forName("UTF-8").newDecoder();
decoder.onMalformedInput(CodingErrorAction.REPORT);
decoder.onUnmappableCharacter(CodingErrorAction.REPORT);
InputStreamReader reader = new InputStreamReader(
    new FileInputStream(file), decoder);
Mikhail Vladimirov
  • 13,572
  • 1
  • 38
  • 40
1

Simply add .newDecoder():

InputStreamReader reader = new InputStreamReader( 
    new FileInputStream(file), Charset.forName("UTF-8").newDecoder());
Esailija
  • 138,174
  • 23
  • 272
  • 326