BufferedReader, read chars in an edittext gives strange chars

Question

Ok, I am reading a .docx file via a BufferedReader and want to store the text in an edittext. The .docx is not in english language but in a different one (greek). I use:

File file = new File(file_Path);
try {
    BufferedReader br = new BufferedReader(new FileReader(file));
    String line;
    StringBuilder text = new StringBuilder();
    while ((line = br.readLine()) != null) {
        text.append(line);
    }
et1.setText(text);

And the result I get is this: enter image description here If the characters are in english language, it works fine. But in my case they aren't. How can I fix this? Thanks a lot

Jon Skeet · Accepted Answer · 2014-07-10T18:49:51.507

3

Ok, I am reading a .docx file via a BufferedReader

Well that's the first problem. BufferedReader is for plain text files. docx files are binary files in a specific format (assuming you mean the kind of file that Microsoft Word saves). You can't just read them like text files. Open the file up in Notepad (not Wordpad) and you'll see what what I mean.

You might want to look at Apache POI.

From comments:

Testing to read a .txt file with the same text gave same results too

That's probably due to using the wrong encoding. FileReader always uses the platform default encoding, which is annoying. Assuming you're using Java 7 or higher, you'd be better off with Files.newBufferedReader:

try (BufferedReader reader = Files.newBufferedReader(path, StandardCharsets.UTF_8)) {
    ...
}

Adjust the charset to match the one you used when saving your text file, of course - if you have the option of using UTF-8, that's a pretty good choice. (Aside from anything else, pretty much everything can handle UTF-8.)

edited Jul 10 '14 at 18:49

answered Jul 10 '14 at 18:43

Jon Skeet

1,421,763
867
9,128
9,194

Testing to read a .txt file with the same text gave same results too T_T – Jul 10 '14 at 18:44
can I use InputStream instead of Reader? – Braj Jul 10 '14 at 18:44
To add to the answer, you'd need to use docx parser: http://stackoverflow.com/questions/7102511/how-read-doc-or-docx-file-in-java – Ivan Koblik Jul 10 '14 at 18:45
@Braj: Well that will read the bytes, but that's not necessarily what you want... – Jon Skeet Jul 10 '14 at 18:47

BufferedReader, read chars in an edittext gives strange chars

1 Answers1