0

I played around with some String -> byte -> binary code and I want my code to work for any byte[] array, currently it only works for, I am not sure ascii?

chinese DONT WORK.

String message =" 汉语";
    playingWithFire(message.getBytes());

while String wow = "WOW..."; Works :( I want it to work for all utf-8 formates. Any pointers on how I can do it?

//thanks

public static byte[] playingWithFire(byte[] bytes){
    byte[] newbytes = null;

        newbytes = new byte[bytes.length];
        for(int i = 0; i < bytes.length; i++){
            String tempStringByte = String.format("%8s", Integer.toBinaryString(bytes[i] & 0xFF)).replace(' ', '0');

            StringBuffer newByteBrf = null;

                newByteBrf = new StringBuffer();
                for(int x = 0; x < tempStringByte.length(); x++){
                    newByteBrf.append(tempStringByte.charAt(x));
                }
                /*short a = Short.parseShort(newByteBrf.toString(), 2);
                ByteBuffer bytesads = ByteBuffer.allocate(2).putShort(a);
                newbytes[i] = bytesads.get();
                cause: java.nio.BufferUnderflowException
                */
                //cause: java.lang.NumberFormatException: Value out of range.
                newbytes[i] = Byte.parseByte(newByteBrf.toString(), 2);

            }
    return newbytes;
}
westberg
  • 167
  • 1
  • 3
  • 7
  • The Java `char` type holds Unicode characters. When moving between `char` and 'byte` you need a suitible encoder to let the byte array be one of the useful encodings. (e.g. Big5, UTF-8, UTF-16, etc.) So store as byte but manipulate as char. – Lee Meador Sep 04 '13 at 21:32
  • How should I use: Charset encoding = Charset.forName("UTF-16");? Ofc I have to use it like this message.getBytes(encoding) but it still donst solv the problem :( – westberg Sep 04 '13 at 21:39

1 Answers1

0

message.getBytes() in your case is trying to convert Chinese Unicode characters to bytes using the default character set on your computer. If its a western charset, its going to be wrong.

Notice that String.getBytes() has another form with String.getBytes(String) where the string is the name of a character encoding that is used to convert the chars of the string to bytes.

The char type will hold Unicode. The byte type only holds raw bits in groups of 8.

So, to convert a Unicode string to bytes encoded as UTF-16 you would use this code:

String message =" 汉语";
byte[] utf16Bytes = message.getBytes("utf-16");

Substitute the name of any encoding that you want to use.

Similarly new String(String, byte[]) constructor can take an array of bytes encoded in some fashion and, given the String, can convert those bytes to Unicode characters.

For example: If you want to convert those bytes, which were encoded as utf-16 above, back to a String (which has Unicode chars in it):

String newMessage = new String(utf16Bytes, "utf-16");

Since I don't know what you mean by "binary code" above, I can't go much farther. As I see it, the Unicode chars have a binary code inside them that represents the characters one-by-one. Also the byte array has a binary code in it that represents the characters with a many-bytes-to-one-character representation. If you want to encrypt the byte array somehow, use a standard, proven encryption method and proven, time-tested procedures to secure the contents.

Lee Meador
  • 12,829
  • 2
  • 36
  • 42
  • Note, my problem is why I cant convert a String binary that is representing ex, 汉语 to a byte / byte[]. – westberg Sep 04 '13 at 21:51
  • I know a standard way to encrypt the content so I only need to be able to transform the bytes in some way. If you have a bettery way of transforming the byte[] into binary repreentation please share it. Or if you know a way to transform the binary representation back into a byte[] :) – westberg Sep 04 '13 at 21:58
  • Notice how in this question's answers (http://stackoverflow.com/questions/1205135/how-to-encrypt-string-in-java) how they always start with a byte array (or a string they convert to a byte array). If you change to byte array to binary and then back to bytes, it's just going in a circle. You have a byte array with the data, just use it as is. – Lee Meador Sep 05 '13 at 14:44