I have this text file with numerous unicodes and trying to print the corresponding UTF-8 characters in the console but all it prints is the hex string. Like if I copy any of the values and paste them into a System.out it works fine, but not when reading them from the text file.
The following is my code for reading the file, which contains lines of values like \u00C0, \u00C1, \u00C2, \u00C3 which are printed to the console and not the values I want.
private void printFileContents() throws IOException {
Path encoding = Paths.get("unicode.txt");
try (Stream<String> stream = Files.lines(encoding)) {
stream.forEach(v -> { System.out.println(v); });
} catch (IOException e) {
e.printStackTrace();
}
}
This is the method I used to parse html that had the unicodes in the first place.
private void parseGermanEncoding() {
try
{
File encoding = new File("encoding.html");
Document document = Jsoup.parse(encoding, "UTF-8", "http://example.com/");
Element table = document.getElementsByClass("codetable").first();
Path f = Paths.get("unicode.txt");
try (BufferedWriter wr = new BufferedWriter(new FileWriter(f.toFile())))
{
for (Element row : table.select("tr"))
{
Elements tds = row.select("td");
String unicode = tds.get(0).text();
if (unicode.startsWith("U+"))
{
unicode = unicode.substring(2);
}
wr.write("\\u" + unicode);
wr.newLine();
}
wr.flush();
wr.close();
}
} catch (IOException e)
{
e.printStackTrace();
}
}