I need to read an excel(.xls) file that i'm receiving. Using the regular charsets like UTF-8, Cp1252, ISO-8859-1, UTF-16LE, none of these helped me, the characters are still malformed.
So i search ended up using juniversalchardet, it showed me that the charset was MacCyrillic, used MacCyrillic to read the file, but still the same weird outcome.
When i open the file on excel everything is fine, all the characters are fine, since its portuguese its filled whit Ç ~ and such. But opening whit notepad or trough java the file is all messed up. But if open the file on my excel and then save it again like .txt it becomes readable
My method to find the charset
public static void lerCharset(String fileName) throws IOException {
byte[] buf = new byte[50000000];
FileInputStream fis = new FileInputStream(fileName);
// (1)
UniversalDetector detector = new UniversalDetector(null);
// (2)
int nread;
while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
detector.handleData(buf, 0, nread);
}
// (3)
detector.dataEnd();
// (4)
String encoding = detector.getDetectedCharset();
if (encoding != null) {
System.out.println("Detected encoding = " + encoding);
} else {
System.out.println("No encoding detected.");
}
// (5)
detector.reset();
fis.close();
}
How can i discover the correct charset? Should i try a different aproach? Like making my java re-save the excel and then start reading?