In Java, how can a file be tested that it's encoding is definitely not utf-8?
I want to be able to validate if the contents are well-formed utf-8.
Furthermore, also need to validate that the file does not start with the byte order mark (BOM).
In Java, how can a file be tested that it's encoding is definitely not utf-8?
I want to be able to validate if the contents are well-formed utf-8.
Furthermore, also need to validate that the file does not start with the byte order mark (BOM).
If you just need to test the file, without actually retaining its contents:
Path path = Paths.get("/home/dave/somefile.txt");
try (Reader reader = Files.newBufferedReader(path)) {
int c = reader.read();
if (c == 0xfeff) {
System.out.println("File starts with a byte order mark.");
} else if (c >= 0) {
reader.transferTo(Writer.nullWriter());
}
} catch (CharacterCodingException e) {
System.out.println("Not a UTF-8 file.");
}