I'm trying to write a Java equivalent to PHP's ord()
:
public static int ord(char c) {
return (int) c;
}
public static int ord(String s) {
return s.length() > 0 ? ord(s.charAt(0)) : 0;
}
This seems to works well for characters with an ordinal value of up to 127
, i.e. within ASCII. However, PHP returns 195
(and higher) for characters from the extended ASCII table or beyond. A comment by Mr. Llama to the answer on a related question explains this as follows:
To elaborate, the reason é showed ASCII 195 is because it's actually a two-byte character (UTF-8), the first byte of which is ASCII 195. – Mr. Llama
I hence changed my ord(char c)
method to mask out all but the most significant byte:
public static int ord(char c) {
return (int) (c & 0xFF);
}
Still, the results differ. Two examples:
ord('é')
(U+00E9) gives195
in PHP while my Java function yields233
ord('⸆')
(U+2E06) gives226
in PHP while my Java function yields6
I manged to get the same behavior for the method that accepts a String
by first turning the String
into a byte
array, explicitly using UTF-8 encoding:
public static int ord(String s) {
return s.length() > 0 ? ord((char)s.getBytes(StandardCharsets.UTF_8)[0]) : 0;
}
However, using the method that accepts a char
still behaves as before and I could not yet find a solution for that. In addition, I don't understand why the change actually worked: Charset.defaultCharset()
returns UTF-8
on my platform anyway. So...
- How can I make my function behave similar to that of PHP?
- Why does the change to
ord(String s)
actually work?
Explanatory answers are much appreciated, as I want to grasp what's going on exactly.