0

Ok, I am reading a .docx file via a BufferedReader and want to store the text in an edittext. The .docx is not in english language but in a different one (greek). I use:

File file = new File(file_Path);
try {
    BufferedReader br = new BufferedReader(new FileReader(file));
    String line;
    StringBuilder text = new StringBuilder();
    while ((line = br.readLine()) != null) {
        text.append(line);
    }
et1.setText(text);

And the result I get is this: enter image description here If the characters are in english language, it works fine. But in my case they aren't. How can I fix this? Thanks a lot

1 Answers1

3

Ok, I am reading a .docx file via a BufferedReader

Well that's the first problem. BufferedReader is for plain text files. docx files are binary files in a specific format (assuming you mean the kind of file that Microsoft Word saves). You can't just read them like text files. Open the file up in Notepad (not Wordpad) and you'll see what what I mean.

You might want to look at Apache POI.

From comments:

Testing to read a .txt file with the same text gave same results too

That's probably due to using the wrong encoding. FileReader always uses the platform default encoding, which is annoying. Assuming you're using Java 7 or higher, you'd be better off with Files.newBufferedReader:

try (BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8)) {
    ...
}

Adjust the charset to match the one you used when saving your text file, of course - if you have the option of using UTF-8, that's a pretty good choice. (Aside from anything else, pretty much everything can handle UTF-8.)

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194