Scenario: I want to read an Arabic dataset with utf-8 encoding. Each word in each line is separated by a space.
Problem: When I read each line, the output is:
??????? ?? ???? ?? ???
Question: How can I read the file and print each line? for more information, here is my Arabic dataset and part of my source code that reads data would be like the following:
private ContextCountsImpl extractContextCounts(Map<Integer, String> phraseMap) throws IOException {
Reader reader;
reader = new InputStreamReader(new FileInputStream(inputFile), "utf-8");
BufferedReader rdr = new BufferedReader(reader);
while (rdr.ready()) {
String line = rdr.readLine();
System.out.println(line);
List<String> phrases = splitLineInPhrases(line);
//any process on this file
}
}