1

I'm currently developing a program in java, and I want to display Chinese pinyin, which I get from a distant website.

But I have the following problem: Chinese pinyin is displayed this way: jiǎ
Whereas it should be displayed this way: jiǎ
(I just typed the same sequence, except I stripped the slashes).

I think the answer to this question is really simple but I'm struggling to find it.

Thomas Dickey
  • 51,086
  • 7
  • 70
  • 105
  • How are you fetching the encoded string and how are you displaying it? – Jason Sperske Feb 12 '13 at 18:10
  • With URL, InputStream and then BufferedReader. But even I think the problem can be solved after, because if I type "j i & # 4 6 2 ;" on google (without the slashes), it displays correctly. I think I'm missing something like escaped characters – user2065648 Feb 12 '13 at 18:13
  • http://stackoverflow.com/questions/994331/java-how-to-decode-html-character-entities-in-java-like-httputility-htmldecode – nhahtdh Feb 12 '13 at 18:16
  • Using Simplified Chinese in a string literal like this: `System.out.println("pīnyīn jiǎ");` seems to work. String in Java are all unicode so you don't need to encode them, I think @nhahtdh's comment will lead you in the right direction – Jason Sperske Feb 12 '13 at 18:17
  • You can actually write code if it is all numbered entities, but if there are named entities, then I recommend you to use existing library to do the job. – nhahtdh Feb 12 '13 at 18:18
  • Like this `System.out.println("ji\u01ce");` – Jason Sperske Feb 12 '13 at 18:25

1 Answers1

0

The problem is you have an HTML encoded Unicode character and what you want is the decoded version of it. A library like commons-lang3 (part of Apache Commons) will take your HTML encoded string and decode it for Java to display like this:

String decoded = StringEscapeUtils.unescapeHtml("jiǎ");

You can also escape Unicode characters in Java like this:

String jia = "ji\u01ce";

This clever one-liner will take a Unicode character and show you its escaped form:

System.out.println( "\\u" + Integer.toHexString('ǎ' | 0x10000).substring(1) );
Community
  • 1
  • 1
Jason Sperske
  • 29,816
  • 8
  • 73
  • 124