3

While printing certain unicode characters in java we get output as '?'. Why is it so and is there any way to print these characters?

This is my code

String symbol1="\u200d";
        StringBuilder strg = new StringBuilder("unicodecharacter");
        strg.insert(5,symbol1);
        System.out.println("After insertion...");
        System.out.println(strg.toString());

Output is After insertion... unico?decharacter

user2821099
  • 39
  • 3
  • 7
  • 1
    Which certain characters, and printed how? Please share some code. This could be an encoding problem, a problem in processing character data, or a font problem. That’s about all one can say without real information about the situation. – Jukka K. Korpela Sep 26 '13 at 20:49
  • You are printing with a non-Unicode encoding (as Unicode has all). If the encoding is ISO-8859-1 (Latin-1) you could try Windows-1252 (Windows Latin-1, a bit more). `new OutputStreamWriter(outputStream, "Windows-1252")`. – Joop Eggen Sep 26 '13 at 20:57
  • What characters and where are you printing them? If you are trying to get arbitrary Unicode out to the Windows console just give up now, it's unresolvably broken. – bobince Sep 27 '13 at 10:31

5 Answers5

3

Here's a great article, written by Joel Spolsky, on the topic. It won't directly help you solve your problem, but it will help you understand what's going on. It'll also show you how involved the situation really is.

Daniel Kaplan
  • 62,768
  • 50
  • 234
  • 356
2

You have a character encoding which doesn't match the character you have or the supported characters on the screen.

I would check which encoding you are using through out and try to determine whether you are reading, storing or printing the value correctly.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
0

Are you sure which encoding you need? You may need to explicitly encode your output as UTF-8 or ISO 8859-1 if you are dealing with European characters.

It Grunt
  • 3,300
  • 3
  • 21
  • 35
0

Java's default behaviour when reading an invalid unicode character is to replace it with the Replacement Character (\uFFFD). This character is often rendered as a question mark.

In your case, the text you're reading is not encoded as unicode, it's encoded as something else (Windows-1252 or ISO-8859-1 are probably the most common alternatives if your text is in English).

Aurand
  • 5,487
  • 1
  • 25
  • 35
0

I wrote an Open Source Library that has a utility that converts any String to Unicode sequence and vise-versa. It helps to diagnose such issues. So for instance to print your String you can use something like this:

String str= StringUnicodeEncoderDecoder.decodeUnicodeSequenceToString("\\u0197" +
   StringUnicodeEncoderDecoder.encodeStringToUnicodeSequence("Test"));

You can read about the library and where to download it and how to use it at Open Source Java library with stack trace filtering, Silent String parsing Unicode converter and Version comparison See the paragraph "String Unicode converter"

Michael Gantman
  • 7,315
  • 2
  • 19
  • 36