4

I am looking for a way to retrieve the Unicode value for a given char, and if possible store it as an integer. Any inbuilt method for this in Java, or do I have to code my own?

Context

I am building a basic encryption program for fun. What I need is to map each character in the Unicode set to an integer, which I can then manipulate in my encryption formula.

I thought about using ASCII values for char by typecasting the char as an int, but then I read about Unicode online, and realised my mistake.

Any help will be appreciated.

dda
  • 6,030
  • 2
  • 25
  • 34
HasKal
  • 53
  • 1
  • 1
  • 3
  • Java `char` is already `UNICODE` (specifically, it's [`UTF-16`](http://en.wikipedia.org/wiki/UTF-16)). – Sergey Kalinichenko Jun 21 '13 at 17:40
  • `char c=somechar....`, `int unicodeValue=somechar` suffice – pinkpanther Jun 21 '13 at 17:40
  • 2
    A Unicode code point can only be stored in an `int`, not in a `char`. This is a classic beginner mistake. A Java `char` only holds individual UTF-16 code units. It cannot hold a code point. – tchrist Jun 22 '13 at 01:15
  • Encrypting Unicode (whether based on code points or the UTF-16 code units that Java strings are based on) is pretty unusual. Most cipher implementations work on a byte-array basis and you will probably find it easier to do the same. To map any Unicode strings to byte arrays, choose a UTF encoding (Java charset), probably UTF-8. – bobince Jun 23 '13 at 23:10

1 Answers1

3

The Java programming language represents text in sequences of 16-bit code units, using the UTF-16 encoding.

Hence this is enough :

char character='a';
int code = character;
System.out.println(code);

As per JLS 3.10.4

Character literals can only represent UTF-16 code units (§3.1), i.e., they are limited to values from \u0000 to \uffff. Supplementary characters must be represented either as a surrogate pair within a char sequence, or as an integer, depending on the API they are used with.

AllTooSir
  • 48,828
  • 16
  • 130
  • 164
  • 6
    If your UTF is in a String, ideally you ought to [iterate over code points rather than chars](http://stackoverflow.com/questions/1527856/how-can-i-iterate-through-the-unicode-codepoints-of-a-java-string) in case there are 4-byte code points present. – Russell Zahniser Jun 21 '13 at 17:49