1

How can I convert a char datatype into its utf-8 int representation in Processing?

So if I had an array ['a', 'b', 'c'] I'd like to obtain another array [61, 62, 63].

mpjan
  • 1,790
  • 5
  • 18
  • 21
  • See my recent edit to my answer. – Nico May 22 '13 at 19:22
  • please update your post so that your question matches the answer you were looking for. @nickecarlo pointed out you actually wanted hex strings, please update your question accordingly so that people who might find this question and answer in the future aren't confused by the answer actually being an answer to a different question than what you have listed here. – Mike 'Pomax' Kamermans May 25 '13 at 19:38

2 Answers2

2

After my answer I figured out a much easier and more direct way of converting to the types of numbers you wanted. What you want for 'a' is 61 instead of 97 and so forth. That is not very hard seeing that 61 is the hexadecimal representation of the decimal 97. So all you need to do is feed your char into a specific method like so:

Integer.toHexString((int)'a');

If you have an array of chars like so:

char[] c = {'a', 'b', 'c', 'd'};

Then you can use the above thusly:

Integer.toHexString((int)c[0]);

and so on and so forth.

EDIT

As per v.k.'s example in the comments below, you can do the following in Processing:

char c = 'a';

The above will give you a hex representation of the character as a String.

// to save the hex representation as an int you need to parse it since hex() returns a String
int hexNum = PApplet.parseInt(hex(c));

// OR

int hexNum = int(c);

For the benefit of the OP and the commenter below. You will get 97 for 'a' even if you used my previous suggestion in the answer because 97 is the decimal representation of hexadecimal 61. Seeing that UTF-8 matches with the first 127 ASCII entries value for value, I don't see why one would expect anything different anyway. As for the UnsupportedEncodingException, a simple fix would be to wrap the statements in a try/catch block. However that is not necessary seeing that the above directly answers the question in a much simpler way.

Nico
  • 3,471
  • 2
  • 29
  • 40
  • 1
    This code gives me UnsupportedEncodingException... In processing, do you know why? This other version works but still giving me 97, 98, 99, 100. Not 61, 62, 63, 64... import java.nio.charset.*; Charset utf = Charset.forName("UTF-8"); String s = "abcd"; byte[] b = s.getBytes(utf); println(b); – v.k. May 22 '13 at 18:11
  • In a processing way, one could say: char c ='a'; println(hex(int(c))); – v.k. May 23 '13 at 11:57
  • @v.k. Thanks, I'll update my answer. This is the downside of going to Processing from Java. Its hard to let go of Java. – Nico May 23 '13 at 12:00
  • Ok it returns a string, so you could do: char c ='a'; println(int(hex(c))); and get an int :) But anyway I think that's one of the goods of Processing, you can just throw in some java when needed. Why not? Ah and the code n your edit is missing the second line. – v.k. May 23 '13 at 13:08
  • @v.k. I've already provided the code that you would use to convert the String to an int in the answer. – Nico May 23 '13 at 13:09
  • 1
    I saw, but I was just pointing that you don't need to go for Integer.parseInt(). You can just use Processing's int() :) – v.k. May 23 '13 at 13:11
  • @v.k. added it to the answer. I won't be updating this answer anymore. I think the OP has gotten enough help already. – Nico May 23 '13 at 13:12
  • this is a needlessly overcomplicated answer =) If we have chars, they are already a numerical type, and can be turned into the corresponding number by using (int) casts (remember that char is a 16 bit unsigned integer numerical type, like int, and unlike String) – Mike 'Pomax' Kamermans May 25 '13 at 14:27
  • @Mike'Pomax'Kamermans Probably should have read that he doesn't want direct conversion to int values but to their hex equivalent. Your jumping the gun would have been fine if this was a new question but its not. Hence I'm downvoting your answer. – Nico May 25 '13 at 17:32
  • As for it being complicated, I'm editing out the previous (Java related) solution that isn't applicable anymore. – Nico May 25 '13 at 17:50
  • fair enough. I'll also leave this here: http://processing.org/reference/hex_.html (incidentally, the OP does not mention hex at all. I didn't read this full comment exchange, so please don't downvote me for not knowing something based on information not in the OP, which still shows decimal values =) – Mike 'Pomax' Kamermans May 25 '13 at 19:33
  • @Mike'Pomax'Kamermans I didn't downvote because you didn't read the comments, I downvoted because the question clearly states that he's looking for something that gives him a = 61, b = 62 etc. If you look through documentation for UTF-8, you will see that that's how they represent these. For beginners it is difficult to understand that 61 is hex value for 97 in decimal. As for your processing reference to hex() function, that was already part of my answer so not sure what the need for repeating that down here is. – Nico May 26 '13 at 00:04
  • @Mike'Pomax'Kamermans as for your answer and the downvote, I'd revert it (if I still can) if you actually answer the question which asks how to get hex values in int. For example: "How can I convert a char datatype into its utf-8 int representation in Processing" is clearly asked in the question. – Nico May 26 '13 at 00:06
1

what do you mean "utf-8 int"? UTF8 is a multi-byte encoding scheme for letters (technically, glyphs) represented as Unicode numbers. In your example you use trivial letters from the ASCII set, but that set has very little to do with a real unicode/utf8 question.

For simple letters, you can literally just int cast:

print((int)'a') -> 97
print((int)'A') -> 65

But you can't do that with characters outside the 16 bit char range. print((int)'二') works, (giving 20108, or 4E8C in hex) but print((int)'') will give a compile error because the character code for does not fit in 16 bits (it's supposed to be 131362, or 20122 in hex, which gets encoded as a three byte UTF-8 sequence 239+191+189)

So for Unicode characters with a code higher than 0xFFFF you can't use int casting, and you'll actually have to think hard about what you're decoding. If you want true Unicode point values, you'll have to literally decode the byte print, but the Processing IDE doesn't actually let you do that; it will tell you that "".length() is 1, when in real Java it's really actually 3. There is -in current Processing- no way to actually get the Unicode value for any character with a code higher than 0xFFFF.

update

Someone mentioned you actually wanted hex strings. If so, use the built in hex function.

println(hex((int)'a')) -> 00000061

and if you only want 2, 4, or 6 characters, just use substring:

println(hex((int)'a').substring(4)) -> 0061
Mike 'Pomax' Kamermans
  • 49,297
  • 16
  • 112
  • 153