0

I have seen Questions and Answers about obtaining the code point number of a Unicode character in Java. For example, the Question How can I get a Unicode character's code?.

But I want the opposite: given an integer number, how do I get text of that character assigned to that code point number?

The char primitive data type is of no use, being limited to only the Basic Multilingual Plane of the Unicode character set. That plane represents approximately the first 64,000 characters defined in Unicode. But Unicode has grown to nearly double that, over 113,000 characters defined now. The numbers assigned to characters range over a million. Being based on 16-bits, a char is limited to a range of 64K, not nearly enough.

Both Character and String classes offer the method codePointAt to examine a character and return an int representing the code point assigned in Unicode. I am looking for the opposite.

➥ Given an int, how to get an object of Character, String, or some implementation of CharSequence that I can then join to other text?

When writing string literals, we can use a Unicode escape sequence with the backslash-with-u. But I am interested in working with integer variables, soft-coding rather than hardcoding the Unicode characters.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • Inside of a string, use a unicode escape. "\u0c30" That is, I believe, the Greek letter pi. The unicode character number is in hex. – NomadMaker Feb 22 '20 at 01:59
  • @NomadMaker Thanks but my intention behind this question was to work with integer variables. I'll edit Question to clarify. And, by the way, that escape sequence is [not so simple when you have more than four hex digits](https://stackoverflow.com/q/37679763/642706). – Basil Bourque Feb 22 '20 at 02:31
  • Yes, I know. Unicode support becomes more complex, quickly. You'll need to read the official java documentation about unicode support. – NomadMaker Feb 22 '20 at 06:49

1 Answers1

0

tl;dr

String s = Character.toString( 128_567 ) ;

Details

You asked for an object of Character, String, or some implementation of CharSequence.

Character

The Character class is actually legacy, a mere object wrapper around the primitive char type. The char type is legacy too, being defined internally as a 16-bit number limited to the first 64K of Unicode code points. Unicode now has more than twice than number of code points assigned to characters, so char fails to represent most characters.

So we cannot instantiate a Character object for a character outside the Basic Multilingual Plane set of characters. So, as a workaround, Character.toString( int ) produces a String containing a single character. String can handle any and all Unicode characters, while Character cannot.

String Character.toString( int )

To get a String object containing a single character determined by an int, pass the int to Character.toString().

As an example, we use FACE WITH MEDICAL MASK, an emoji character at U+1F637 (decimal: 128,567).

// -----|  input  |----------------
String input = "" ;                                 // FACE WITH MEDICAL MASK at code point U+1F637 (decimal: 128,567).
int codePoint = input.codePointAt( 0 ) ;              // Returns 128,567. 
System.out.println( "codePoint : " + codePoint ) ;   

codePoint : 128567

Convert that int primitive variable to a String.

// -----|  String  |----------------
String output = Character.toString( codePoint ) ;     // Pass an `int` primitive integer number.
System.out.println( "output : " + output ) ; 

output :

Or use a literal integer number.

String output2 = Character.toString( 128_567 ) ;      // Pass an integer literal.
System.out.println( "output2 : " + output2 ) ;

output2 :

See this code run live at IdeOne.com.

CharSequence

The code above works, as String is an implementation of CharSequence.

CharSequence cs = Character.toString( 128_567 ) ;     // Returns a `String` which is a `CharSequence`. 

appendCodePoint

The StringBuilder class offers a method appendCodePoint to add a character via its assigned Unicode code point number. Ditto for thread-safe StringBuffer.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154