Facing Problems With Unicode Characters

Question

There is a string containing some Unicode characters (actually Chinese characters), I can not convert them to their original appearance.

Method System.out.println() just prints \u....." Unicode strings, not the Chinese character. This is the code I'm using:

String code = "\\" + "u751c";
System.out.println(code);
System.out.println(code.length());

code = "\u751c";
System.out.println(code);
System.out.println(code.length());

Which results in:

\u751c 
6
甜
1

How can I get the actual Chinese character?

Thanks for all your comments and answers.Maybe I didn't make myself clear.The string I get may be form of ("\" + "u751c"), result of System.out.println() just returns "\u751c" not character "甜".

// text_title is the string scraped from other website using Jsoup. 
System.out.println(text_title);
System.out.println("\u53f0\u6e7e\u8fdb\u53e3 Love of office lady \u5c0f\u8d44\u5973\u4e4b\u604b \u8349\u8393\u5de7\u514b\u529b\u674f\u4ec1\u5377\u5fc3\u9165 80g/\u76d2");

Content in the console:

\u53f0\u6e7e\u8fdb\u53e3 Love of office lady \u5c0f\u8d44\u5973\u4e4b\u604b \u8349\u8393\u5de7\u514b\u529b\u674f\u4ec1\u5377\u5fc3\u9165 80g/\u76d2
台湾进口 Love of office lady 小资女之恋 草莓巧克力杏仁卷心酥 80g/盒

The string text_title maybe escaped with this form ("\" + "u751c"), how can I convert it to Chinese characters?

What exactly is the issue? Are you seeing the literal \u751c (and want to see the chinese character)? — SheetJS, Oct 27 '13 at 10:00
Your second 3 lines *does* print a Chinese character. What is the issue? — Mark Tolonen, Oct 27 '13 at 10:21
Which is the programming language used, and why isn’t the question tagged with it? — Jukka K. Korpela, Oct 27 '13 at 11:24

score 0 · Accepted Answer · edited May 23 '17 at 12:28

What language is this? It looks like Javascript, so I'll assume that for now. But please add that sort of critical information to your questions, next time.

You are confused because you are assuming both ways of assigning values to the text string are the same. They are not -- as you can observe by the results.

In the first line, you assign the literal single character '\' to a string, followed by more literal characters. The compiler sees the first character, '\', a common escape and so it knows that the next one needs to be treated in a special way. It's the character '\' again, but since it is escaped, this time its "special" status is not applied, and it gets added to your string. Then some more characters follow, and the string is what you see: "\u751c".

In the fifth line, the same thing happens; only this time, the "special" character is a lowercase 'u'. That means that four hex characters should follow (and look! they do) and that the entire construction forms a single character. That single character gets stored into the string, and what do you know, it's what you see as output.

At this point it's unclear what you are asking. If you are having troubles getting your Unicode characters to show using the first way, simply use the second way ... If you have data stored in the first way, you need to convert the escaped characters to UC. Look at this link for a discussion: Convert escaped Unicode character back to actual character

(One of the examples there uses System.out.println -- you are programming in Java. Did you know that?)

Thanks for all your comments and answers.Maybe I didn't make myself clear.The string I get may be form of ("\\" + "u751c"), result of System.out.println() just returns "\u751c" not character "甜". — rogeshu1990, Oct 27 '13 at 14:05
Thanks everyone,this code is the answer i want.
str = org.apache.commons.lang.StringEscapeUtils.unescapeJava(str);
Thanks for providing the link, Jongware..... — rogeshu1990, Oct 27 '13 at 14:39

Facing Problems With Unicode Characters

1 Answers1