5

I am building a language, a toy language. The syntax \#0061 is supposed to convert the given Unicode to an character:

String temp = yytext().subtring(2);

Then after that try to append '\u' to the string, I noticed that generated an error.

I also tried to "\\" + "u" + temp; this way does not do any conversion.

I am basically trying to convert Unicode to a character by supplying only '0061' to a method, help.

unwind
  • 391,730
  • 64
  • 469
  • 606
ferronrsmith
  • 1,110
  • 5
  • 28
  • 47

4 Answers4

11

Strip the '#' and use Integer.parseInt("0061", 16) to convert the hex digits to an int. Then cast to a char.

(If you had implemented the lexer by hand, an alternatively would be to do the conversion on the fly as your lexer matches the unicode literal. But on rereading the question, I see that you are using a lexer generator ... good move!)

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
2

You need to convert the particular codepoint to a char. You can do that with a little help of regex:

String string = "blah #0061 blah";

Matcher matcher = Pattern.compile("\\#((?i)[0-9a-f]{4})").matcher(string);
while (matcher.find()) {
    int codepoint = Integer.valueOf(matcher.group(1), 16);
    string = string.replaceAll(matcher.group(0), String.valueOf((char) codepoint));
}

System.out.println(string); // blah a blah

Edit as per the comments, if it is a single token, then just do:

String string = "0061";
char c = (char) Integer.parseInt(string, 16);
System.out.println(c); // a
BalusC
  • 1,082,665
  • 372
  • 3,610
  • 3,555
  • Erm ... you don't want to implement a lexical analyser using Java regex pattern matching. – Stephen C Dec 20 '09 at 04:35
  • I need something like the first example you posted. I ran the code making the pattern changes as I need them however the ReplaceAll doesn't replace anything. The string is the same as the original string :( –  May 22 '12 at 13:16
  • 2
    @Eric: press `Ask Question` button on right top to ask a question on which you would like to get answers. – BalusC May 22 '12 at 13:19
2

i am basically trying to convert unicode to a character by supplying only '0061' to a method, help.

char fromUnicode(String codePoint) {
  return (char)  Integer.parseInt(codePoint, 16);
}

You need to handle bad inputs and such, but that will work otherwise.

danben
  • 80,905
  • 18
  • 123
  • 145
0

\uXXXX is an escape sequence. Before execution it has already been converted into the actual character value, its not "evaluated" in anyway at runtime.

What you probably want to do is define a mapping from your #XXXX syntax to Unicode code points and cast them to char.

Kevin Montrose
  • 22,191
  • 9
  • 88
  • 137