27

I'm trying to write this unicode cross symbol () in Java:

class A {
    public static void main(String[] args) {
        System.out.println("\u2300");
        System.out.println("\u10035");
    }
}

I can write o with a line through it () just fine, but the cross symbol doesn't show up, instead it just prints the number 5:

# javac A.java && java A
⌀
ဃ5

Why?

hippietrail
  • 15,848
  • 18
  • 99
  • 158
Dog
  • 7,707
  • 8
  • 40
  • 74
  • 1
    The character you are asking about is from the Linear-B script. Is that really what you want? In general, you'll find that characters outside the BMP aren't often available in general-purpose fonts. – parsifal May 17 '13 at 19:32
  • 1
    @parsifal: I was trying to make unicode art for utility polls on the roadside. – Dog May 17 '13 at 19:45
  • 1
    This kind of thing makes me wonder, did I get this right in my own programming language? `$ txr -c '@(bind a "\x10035")'` Output: `a=""`. Yup! Of course; I wouldn't cut off hex digits specifying a character arbitrarily at four. – Kaz May 18 '13 at 01:12
  • dude how do you come up with such epic questions – L̲̳o̲̳̳n̲̳̳g̲̳̳p̲̳o̲̳̳k̲̳̳e̲̳̳ Jun 29 '13 at 19:19

4 Answers4

52

You're looking for U+10035, which is outside the Basic Multilingual Plane. That means you can't use \u to specify the value, as that only deals with U+0000 to U+FFFF - there are always exactly four hex digits after \u. So currently you've got U+1003 ("MYANMAR LETTER GHA") followed by '5'.

Unfortunately Java doesn't provide a string literal form which makes characters outside the BMP simple to express. The only way of including it in a literal (but still in ASCII) is to use the UTF-16 surrogate pair form:

String cross = "\ud800\udc35";

Alternatively, you could use the 32-bit code point form as an int:

String cross = new String(new int[] { 0x10035 }, 0, 1);

(These two strings are equal.)

Having said all that, your console would still need to support that character - you'll need to try it to find out whether or not it does.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • I see. How did you obtain this sequence of 2 unicode escapes? Is there a quick way to do it in my head while writing unicode string literals? – Dog May 17 '13 at 19:10
  • 1
    @Dog: To tell the truth, I wrote some C# code - because in C# I can use `\U00010035` :) Read the link I've now included for "UTF-16 surrogate pair" to see why those two values are put together - it's up to you to work out whether or not you can do the maths in your head, but I know I couldn't... at least not reliably and quickly ;) – Jon Skeet May 17 '13 at 19:12
  • 2
    One way to get various encodings is from FileFormat.info: http://www.fileformat.info/info/unicode/char/10035/index.htm – parsifal May 17 '13 at 19:31
  • @parsifal: Nice - I hadn't seen that. – Jon Skeet May 17 '13 at 19:31
  • 2
    To print literal forms of code points in Java: `for(char ch : Character.toChars(0x10035)) System.out.format("\\u%04x", (int) ch);` – McDowell May 17 '13 at 19:34
  • Not only will the console have to support it, but the console will have to have a font that supports it, or be able to be configured to use such font. If the console shows something like boxes, try copying and pasting that into some other app that you know to support Unicode beyond the BMP. It's worth having one or two fonts installed as fallbacks that cover as much of Unicode as possible so that you can see some glyph, even if it might be ugly. Eg. Code2000 & Code2001 fonts. – hippietrail Feb 10 '16 at 07:54
3

I believe Java represents Unicode characters from 0x0000 to 0xFFFF. Java would evaluate "\u10035" to whatever "\u1003" is and a 5 after that.

joshreesjones
  • 1,934
  • 5
  • 24
  • 42
1

0x10035 is a supplemental Unicode character. You'll need to font that supports it if you want your program to render it.

http://www.oracle.com/technetwork/articles/javase/supplementary-142654.html

Tap
  • 6,332
  • 3
  • 22
  • 25
0

Unicode escapes are 4 characters long. You are printing \u1003 followed by '5'. Are you sure you have the right code point?

Aurand
  • 5,487
  • 1
  • 25
  • 35