1

I am trying to read a file which contains some japanese characters.

RandomAccessFile file = new RandomAccessFile("japanese.txt", "r");
String line;
while ((line = file.readLine()) != null) {
   System.out.println(line);
}

Its returning some garbled characters instead of japanese. But when I am converting the encoding, it printing it properly.

line = new String(line.getBytes("ISO-8859-1"), "UTF-8");

What does this mean? Is the text file in ISO-8859-1 encoding?

$ file -i japanese.txt returns following:

japanese.txt: text/plain; charset=utf-8

Please explain which it explicitely requires the file to convert from Latin 1 to UTF-8?

Shashwat Kumar
  • 5,159
  • 2
  • 30
  • 66

2 Answers2

3

No, readString is an obsolete method, still before charsets/encodings and such. It turns every byte into a char with high byte 0. Byte 0x85 is a line separator (EBCDIC NEL), and if that were in some UTF-8 multibyte sequence, the actual line would be broken into two lines. And some more scenarios are feasible.

Best use Files. It has a newBufferedReader(path, Charset) and a fixed default charset UTF-8.

Path path = Paths.get("japanese.txt");
try (BufferedReader file = Files.newBufferedReader(path)) {
    String line;
    while ((line = file.readLine()) != null) {
        System.out.println(line);
    }
}

Now you'll read correct Strings.

A RandomAccessFile basically is for binary data.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
1

It looks like it is ISO, but I would try reading with that encoding and seeing what happens.

Since you don't do random access, I would just create a BufferedReader with the right encoding and use that:

String charSetName = // either UTF-8 or iso - try both
FileInputStream is = new FileInputStream(fileName);
InputStreamReader isr = new InputStreamReader(is, Charset.forName(charSetName));
BufferedReader reader = new BufferedReader(isr);

while ((line = reader.readLine()) != null) {
    System.out.println(line);
}
rghome
  • 8,529
  • 8
  • 43
  • 62