3

My java program is reading unicode from text file. e.g. \uffff.. View from the java GUI is no problem, but when i try to print out, all wording are overwritten, is it because of \u, or any other way to avoid the words overwritten?

sorry about my broken english.. thanks.

Joran Den Houting
  • 3,149
  • 3
  • 21
  • 51
Terrence
  • 45
  • 1
  • 1
  • 7
  • 3
    What do you mean by "without `\u`"? The `\u` part is only relevant in Java source code. Your question is extremely confusing at the moment. You need to give a *lot* more context. Please read http://tinyurl.com/so-hints – Jon Skeet Sep 27 '13 at 09:46
  • So I edited your post because of missing tags, and then you just remove the edits? – Joran Den Houting Sep 27 '13 at 10:05
  • @JonSkeet it's also used by property files in java, at least. This means it's used by resources as well. – eis Sep 27 '13 at 10:15
  • @eis: That's true, yes. It's still entirely unclear what's going on though... – Jon Skeet Sep 27 '13 at 10:25

2 Answers2

4

The notation \uXXXX primarily only occures in .java and .properties files. There it is read as a Unicode code point. Unicode text (=using all kind of special characters) often uses the UTF-8 format (though also sometimes UTF16LE and UTF16BE are used).

This text is read as:

BufferedReader in = new BufferedReader(
        new InputStreamReader(new FileInputStream(file), "UTF-8"));

And (for good order) written as

new OutputStreamWriter(new FileOutputStream(file), "UTF-8")
new PrintWriter(file, "UTF-8")

Especially not with FileReader and FileWriter which old utility classes use the platform encoding.

IF the text would countain \u20AC, that would be irregular, and would be printed literally (backslash, u, 20AC),

Now if you mean there are problems with Unicode characters out of the normal ASCII range, like for the euro symbol , then it might be a matter of font, or a needed conversion, say to Windows Latin 1: "Windows-1252".

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • thanks @joop, i have tried this but it cannot solve my problem, my problem is every things that read from **.properties** file are overwritten each other. What encoding type should i save for this properties file? – Terrence Sep 28 '13 at 02:17
  • That changes the case entirely. A **.properties** is accessed by `ResourceBundle`, isn't that so in your case`? The standard is that .properties are encoded as **ISO-8859-1** (really an exception), and the special characters are `\u`-encoded. – Joop Eggen Sep 28 '13 at 06:45
1

As you already know, '\u' also known as Unicode escape is used to represent an international character. So as you can't enter that character from the keyboard itself, you need to use the unicode sequence to generate the character.

However, if such international characters are already there in a text file, so ofcourse you can read it. Java provides the class Charset, please refer the API at http://docs.oracle.com/javase/1.4.2/docs/api/java/nio/charset/Charset.html

You should use Reader/Writer API in Java to deal with such characters. Because it supports 16 bit character which includes all the different languages other than Alphabets and ASCII. Where as InputStream/OutputStream do support only 8 bit character.

So to read such characters you can use:

BufferedReader in = new BufferedReader(
        new InputStreamReader(new FileInputStream(file), "UTF-8"));

Here UTF-8 is the CharSet.

Similarly you can print the data. But where you print, your editor (where you print the character) must support the unicode characters.

You can also refer the below link for some more replies from different people: Read unicode text files with java

Community
  • 1
  • 1
santu
  • 665
  • 2
  • 7
  • 23