I have a problem whereby I need to be able to detect whether a byte array contains characters which comply with ISO-8859-1 encoding.
I have found the following question useful Java : How to determine the correct charset encoding of a stream however none of the answers appear to fully answer my question.
I have attempted to use the TikaEncodingDetector as shown below
public static Charset guessCharset(final byte[] content) throws IOException {
final InputStream isx = new ByteArrayInputStream(content);
return Charset.forName(new TikaEncodingDetector().guessEncoding(isx));
}
Unfortunately this approach makes different predictions based about the content of the byte array. E.g. an array containing 'h','e','l','l','o' is determined to be ISO-8859-1. 'w','o','r','l','d' comes out as IBM500, 'a','b','c','d','e' results in UTF-8.
All I want to know is, does my byte array correctly validate to the ISO-8859-1 standard. I would be grateful for suggestions on the best way to carry out this task.