Java Character literals value with getNumericValue()

Question

Why do I get the same results for both upper- and lowercase literals? For instance:

char ch1 = 'A';
char ch2 = 'a';
char ch3 = 'Z';
char ch4 = 'z';

print("ch1 -- > " + Integer.toBinaryString(Character.getNumericValue(ch1)));
print("ch2 -- > " + Integer.toBinaryString(Character.getNumericValue(ch2)));
print("ch3 -- > " + Integer.toBinaryString(Character.getNumericValue(ch3)));
print("ch4 -- > " + Integer.toBinaryString(Character.getNumericValue(ch4)));

As results I get:

ch1 -- > 1010
ch2 -- > 1010
ch3 -- > 100011
ch4 -- > 100011

And don't really see the difference between 'A' and 'a'. Even if I use character literals in UTF form (\u0041 for 'A' and \u0061 for 'a') I do get the same results.

score 7 · Accepted Answer · answered Dec 05 '12 at 07:02

7

It's behaving exactly as documented:

The letters A-Z in their uppercase ('\u0041' through '\u005A'), lowercase ('\u0061' through '\u007A'), and full width variant ('\uFF21' through '\uFF3A' and '\uFF41' through '\uFF5A') forms have numeric values from 10 through 35.

Basically this means that when parsing hex (say), 0xfa == 0xFA, as you'd expect.

I'd only expect case to matter when using something like base64.

answered Dec 05 '12 at 07:02

Jon Skeet

1,421,763
867
9,128
9,194

Let's just delete that comment and forget it happened ;) – Kevin Coulombe Dec 05 '12 at 07:05
Wow, thank you, I've missed this interesting point from docs. Pretty 'weird' behavior, IMHO. What I was expecting -- just to get "an integer representation of character literal". – Xentatt Dec 05 '12 at 07:08
@DmitriyUgnichenko: What result were you expecting, then? – Dolda2000 Dec 05 '12 at 07:11
1

It may sound strange, but I expected 0x0041 for 'A' and 0x0061 for 'a'. – Xentatt Dec 05 '12 at 07:12
What do you get 0x0061 and 0x007A from? :) If you want the actual character codepoint, just use your `ch1` and `ch2` precisely as they are, but the codepoints for 'A' and 'a' are 0x41 and 0x61. – Dolda2000 Dec 05 '12 at 07:15
3

@DmitriyUgnichenko: It's not weird at all - it's the documented behaviour. It's trying to find what number value the character means, e.g. '0' => 0. As Dolda2000 says, if you just want the Unicode value, you can just use `int value = 'A';`. – Jon Skeet Dec 05 '12 at 07:16
3

And to get the Unicode value as a binary string: Integer.toBinaryString('a') – Patricia Shanahan Dec 05 '12 at 07:17

score 4 · Answer 2 · answered Dec 05 '12 at 07:18

Judging from the commentary, you're actually looking for the codepoints of the characters, rather than their numeric value, so I'll just isolate that into an answer. The getNumericValue() function returns what the character means as a number when interpreting its glyph, it does not return the codepoint of a character. For instance, getNumericValue('5') returns 5 as an int, not the codepoint of 5.

To use the codepoints, just use your variables or the char literals as they are. char is a numeric datatype. For instance, System.out.println((int)'a'); will print 65, quite simply.

Yep, thank you. That's what I wanted (even don't know why I decided to use this getNumericValue() method). Everything is much more easier. — Xentatt, Dec 05 '12 at 07:21

Java Character literals value with getNumericValue()

2 Answers2

Linked