32

Is there a standard method to convert a string like "\uFFFF" into character meaning that the string of six character contains a presentation of one unicode character?

Bozho
  • 588,226
  • 146
  • 1,060
  • 1,140
Dima
  • 1,326
  • 2
  • 13
  • 19
  • do you mean like:: System.out.println("Enter a Character:"); String s = read.readLine(); char c = s.charAt(0); – jjj Jan 24 '10 at 08:13
  • 1
    Actually the edit by jleedev is wrong: Dima said his string had 6 characters, not seven. Internally, even in Java, a "string" doesn't contain two backslashes. I read the original version as "\uFFFF", a "generic" string, without escaping, because the poster used the lowercase "string" word and not "String" and because he precisely stated that the string was made of 6 characters. So technically, I'm pretty sure the string he wants to convert is "\uFFFF", and *not* "\\uFFFF". The fact that in a Java source code you have to enter "\uFFFF" as "\\uFFFF" is, to me, unrelated to the question. – SyntaxT3rr0r Jan 24 '10 at 08:58
  • rolled it back. let the author define the context of the question better. – Bozho Jan 24 '10 at 10:17
  • Fair enough, but the fact that the backslash wasn't escaped seemed to be confusing. – Josh Lee Jan 24 '10 at 10:35

5 Answers5

35
char c = "\uFFFF".toCharArray()[0];

The value is directly interpreted as the desired string, and the whole sequence is realized as a single character.

Another way, if you are going to hard-code the value:

char c = '\uFFFF';

Note that \uFFFF doesn't seem to be a proper unicode character, but try with \u041f for example.

Read about unicode escapes here

Bozho
  • 588,226
  • 146
  • 1,060
  • 1,140
  • 1
    I think he meant for the string literal that has 6 characters, with two backslashes in the source code, like "\\uFFFF". – Yoni Jan 24 '10 at 08:31
  • nothing. I don't quite graps the context behind the question actually. – Bozho Jan 24 '10 at 10:15
  • @Bozho that makes two of us :) – Yoni Jan 24 '10 at 10:44
  • I don't know... maybe it's just because I've run across this a few times before but it seemed obvious to me. :) If you are reading certain sort text, or more commonly RDF files (like n-quads) it is quite common to read literal \uFFFF and need to convert it to a real char code. – PSpeed Jan 24 '10 at 10:50
  • Still, some short sample code would have clarified, I suppose. – PSpeed Jan 24 '10 at 10:52
  • @Bozo, only that `char c = Character.valueOf('\uFFFF');` seemed overly complex to me :-) – rsp Jan 24 '10 at 10:53
  • so you did grasp the context :) – rsp Jan 24 '10 at 12:27
  • The value \uFFFF "is guaranteed not to be a Unicode character at all" : http://www.unicode.org/charts/PDF/UFFF0.pdf – trashgod Jan 24 '10 at 15:56
21

The backslash is escaped here (so you see two of them but the s String is really only 6 characters long). If you're sure that you have exactly "\u" at the beginning of your string, simply skip them and converter the hexadecimal value:

String s = "\\u20ac";

char c = (char) Integer.parseInt( s.substring(2), 16 );

After that c shall contain the euro symbol as expected.

SyntaxT3rr0r
  • 27,745
  • 21
  • 87
  • 120
  • This is what I do when I need this. – PSpeed Jan 24 '10 at 10:51
  • 1
    char c = (char) Integer.parseInt( s.substring(2), 16 ); - looks very much what I meant. \uFFFF is a format of how Unicode is presented in where I read it from (say ASCII file), not a literal. I magined that there could be a more direct method, but this one should be also fine. Thanks to everybody. – Dima Jan 24 '10 at 17:35
19

If you are parsing input with Java style escaped characters you might want to have a look at StringEscapeUtils.unescapeJava. It handles Unicode escapes as well as newlines, tabs etc.

String s = StringEscapeUtils.unescapeJava("\\u20ac\\n"); // s contains the euro symbol followed by newline
Jonathan
  • 20,053
  • 6
  • 63
  • 70
stoivane
  • 311
  • 1
  • 3
6
String charInUnicode = "\\u0041"; // ascii code 65, the letter 'A'
Integer code = Integer.parseInt(charInUnicode.substring(2), 16); // the integer 65 in base 10
char ch = Character.toChars(code)[0]; // the letter 'A'
Yoni
  • 10,171
  • 9
  • 55
  • 72
  • Why do you use toChars() when you hard-code `[0]` anyway? Your code goes half-way to supporting high unicode codepoints but misses the other half. What's the point? – Joachim Sauer Jan 24 '10 at 10:54
  • Why not jsut type-cast the integer directly to a `char`? It is already in the valid range: `char ch = (char) code;` – Remy Lebeau Aug 17 '17 at 02:37
0

Try this in BlueJ --> char c ='\uffff'; System.out.println(c); Hi, I loved the way everyone understood it but after I tried it in BlueJ, it shows a blank screen...

But after you copy the only (invisible) character and paste it on the google search bar, it becomes clear that it responds well but the output screen was unable to display it...

The specific outcome was this character -->

Have a nice day experimenting!