1

Q: When casting an int to a char in Java, it seems that the default result is the ASCII character corresponding to that int value. My question is, is there some way to specify a different character set to be used when casting?

(Background info: I'm working on a project in which I read in a string of binary characters, convert it into chunks, and convert the chunks into their values in decimal, ints, which I then cast as chars. I then need to be able to "expand" the resulting compressed characters back to binary by reversing the process. I have been able to do this, but currently I have only been able to compress up to 6 "bits" into a single character, because when I allow for larger amounts, there are some values in the range which do not seem to be handled well by ASCII; they become boxes or question marks and when they are cast back into an int, their original value has not been preserved. If I could use another character set, I imagine I could avoid this problem and compress the binary by 8 bits at a time, which is my goal.)

I hope this was clear, and thanks in advance!

A Humble Noob
  • 11
  • 1
  • 2

4 Answers4

3

Your problem has nothing to do with ASCII or character sets.

In Java, a char is just a 16-bit integer. When casting ints (which are 32-bit integers) to chars, the only thing you are doing is keeping the 16 least significant bits of the int, and discarding the upper 16 bits. This is called a narrowing conversion.

References:

Grodriguez
  • 21,501
  • 10
  • 63
  • 107
1

The conversion between characters and integers uses the Unicode values, of which ASCII is a subset. If you are handling binary data you should avoid characters and strings and instead use an integer array - note that Java doesn't have unsigned 8-bit integers.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • 1
    He mentions he's *casting* from int to char. This has nothing to do with character sets: Chars are nothing but 16-bit integers; casting just keeps the lower 16 bits and discards the upper 16 bits. – Grodriguez Nov 21 '10 at 22:31
  • @Grodriguez: A `char` is a 16-bit integer *with an associated meaning as text*. The fact that it is 16-bit is more a historic coincidence - it could just as easily have been 24 bits. If you just want a 16-bit integer you should use a `short`. Using a `char` to store arbitrary binary data is not a good approach - that's not what `char` was designed for. http://stackoverflow.com/questions/1841461/unsigned-short-in-java/1841471#1841471 – Mark Byers Nov 21 '10 at 22:37
  • Regardless of the "intended use", the fact is that a char *is* a 16-bit unsigned integer. While I agree that using a char to store binary data is not a good approach, saying that "the conversion between characters and integers uses Unicode values" is misleading. Casting from int to char results in a narrowing primitive conversion as defined by the JLS (http://java.sun.com/docs/books/jls/second_edition/html/conversions.doc.html#25363). – Grodriguez Nov 22 '10 at 07:12
0

What you search for in not a cast, it's a conversion.

There is a String constructor that takes an array of byte and a charset encoding. This should help you.

shellholic
  • 5,974
  • 1
  • 20
  • 30
  • Thank you, I did find out about this method. I just wondered if there was a way to change how casting worked, as this would mean I could improve my existing implementation with just a simple modification. – A Humble Noob Nov 21 '10 at 22:37
0

I'm working on a project in which I read in a string of binary characters, convert it into chunks, and convert the chunks into their values in decimal, ints, which I then cast as chars. I then need to be able to "expand" the resulting compressed characters back to binary by reversing the process.

You don't mention why you are doing that, and (to be honest) it's a little hard to follow what you're trying to describe (for one thing, I don't see why the resulting characters would be "compressed" in any way.

If you just want to represent binary data as text, there are plenty of standard ways of accomplishing that. But it sounds like you may be after something else?

Dmitri
  • 8,999
  • 5
  • 36
  • 43
  • 1
    Sorry about that, I just didn't want to make a wall of text that nobody would be interested in reading. – A Humble Noob Nov 21 '10 at 22:26
  • Errr, didn't realize hitting Enter would post, let me elaborate: the project involves Huffman encoding; we were required to convert text from a .txt file, into binary, and be able to decode that binary to get the original text. A bonus was offered if we could store the encoded text in a compressed form, and I decided to simply grab the binary in chunks and instead of printing zeroes and ones, I take the decimal value of the binary chunk and cast that as a char, giving me one character for many. That's all I mean by compression. – A Humble Noob Nov 21 '10 at 22:30
  • And yes, I found out there were other ways of achieving this, but at this point I have already finished the project, but because certain values don't produce a character, when casted as a char, that retains the value, I only have it working by taking 6 binary characters at a time. I was only wondering if there was a way to force casting to use some other character set, so I could try to find one that worked for, say, chunks of 8. – A Humble Noob Nov 21 '10 at 22:34
  • Ah, OK, so the file is a text file with '1' and '0' characters, not a binary file? Why not just store it as an actual binary file? Convert each set of 8 bit characters to a byte value and store it as a byte (using FileOutputStream, for example). As mentioned before, when casting to char, you are not casting to a character set, just to an integer value (which is converted to a specific character encoding when you print it), so if you happen on character which are not printable in Unicode, randomly choosing a different character set is unlikely to help. – Dmitri Nov 21 '10 at 22:50
  • If you do need to store the data as text, Base64 encoding is a common way to make sure you're only using printable characters. – Dmitri Nov 21 '10 at 22:51
  • Ah, thanks for the info, this answers my question. I guess the only way to achieve better compression would be to use one of the other solutions such as the one you've suggested. Fair enough, and thanks again! – A Humble Noob Nov 21 '10 at 23:38