100

I would like to convert a character array to a byte array in Java. What methods exists for making this conversion?

Joel
  • 4,732
  • 9
  • 39
  • 54
Arun Abraham
  • 4,011
  • 14
  • 54
  • 75

6 Answers6

184

Convert without creating String object:

import java.nio.CharBuffer;
import java.nio.ByteBuffer;
import java.util.Arrays;

byte[] toBytes(char[] chars) {
  CharBuffer charBuffer = CharBuffer.wrap(chars);
  ByteBuffer byteBuffer = Charset.forName("UTF-8").encode(charBuffer);
  byte[] bytes = Arrays.copyOfRange(byteBuffer.array(),
            byteBuffer.position(), byteBuffer.limit());
  Arrays.fill(byteBuffer.array(), (byte) 0); // clear sensitive data
  return bytes;
}

Usage:

char[] chars = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};
byte[] bytes = toBytes(chars);
/* do something with chars/bytes */
Arrays.fill(chars, '\u0000'); // clear sensitive data
Arrays.fill(bytes, (byte) 0); // clear sensitive data

Solution is inspired from Swing recommendation to store passwords in char[]. (See Why is char[] preferred over String for passwords?)

Remember not to write sensitive data to logs and ensure that JVM won't hold any references to it.

This method is needed only for security concerns. If data is not sensitive it better to use String.getBytes instead.


Here is pseudo-code (actually, Scala code) how to do the same thing manually for UTF-8:

val xs: Array[Char] = "A ß € 嗨  ".toArray
val len = xs.length
val ys: Array[Byte] = new Array(3 * len) // worst case
var i = 0; var j = 0 // i for chars; j for bytes
while (i < len) { // fill ys with bytes
  val c = xs(i)
  if (c < 0x80) {
    ys(j) = c.toByte
    i = i + 1
    j = j + 1
  } else if (c < 0x800) {
    ys(j) = (0xc0 | (c >> 6)).toByte
    ys(j + 1) = (0x80 | (c & 0x3f)).toByte
    i = i + 1
    j = j + 2
  } else if (Character.isHighSurrogate(c)) {
    if (len - i < 2) throw new Exception("overflow")
    val d = xs(i + 1)
    val uc: Int = 
      if (Character.isLowSurrogate(d)) {
        Character.toCodePoint(c, d)
      } else {
        throw new Exception("malformed")
      }
    ys(j) = (0xf0 | ((uc >> 18))).toByte
    ys(j + 1) = (0x80 | ((uc >> 12) & 0x3f)).toByte
    ys(j + 2) = (0x80 | ((uc >>  6) & 0x3f)).toByte
    ys(j + 3) = (0x80 | (uc & 0x3f)).toByte
    i = i + 2 // 2 chars
    j = j + 4
  } else if (Character.isLowSurrogate(c)) {
    throw new Exception("malformed")
  } else {
    ys(j) = (0xe0 | (c >> 12)).toByte
    ys(j + 1) = (0x80 | ((c >> 6) & 0x3f)).toByte
    ys(j + 2) = (0x80 | (c & 0x3f)).toByte
    i = i + 1
    j = j + 3
  }
}
// check
println(new String(ys, 0, j, "UTF-8"))

This code looks similar to what is in JDK[2] and Protobuf[3].

multitask landscape
  • 8,273
  • 3
  • 33
  • 31
  • Wouldn't this create a ByteBuffer? I guess that's less costly than a String object? – Andi Jay Jul 02 '12 at 19:41
  • @Andrii Nemchenko Yes you get a trailing 0 in last position if you use UTF-8 (originally I was using US-ASCII). I have refactored the code, now it works correctly with UTF-8 to. Thanks for notice! – Cassian May 17 '17 at 15:42
  • @AndriiNemchenko Here 1 char takes 1 byte. Can I make it a half byte. I remember reading that 1 char occupies 4 bits. – Prabs Aug 20 '18 at 06:23
  • 1
    This 'toBytes()' method has an important side effect. It wipes the input chars. charBuffer.array() actually is the input chars. Arrays.fill() would actually wipe out the input. In many cases it is OK, but sometime it creates undesired effect. – Guangliang Oct 30 '18 at 21:23
86
char[] ch = ?
new String(ch).getBytes();

Or, to get non-default charset:

new String(ch).getBytes("UTF-8");

Update: Since Java 7:

new String(ch).getBytes(StandardCharsets.UTF_8);
sazzad
  • 5,740
  • 6
  • 25
  • 42
Tarlog
  • 10,024
  • 2
  • 43
  • 67
20

Edit: Andrey's answer has been updated so the following no longer applies.

Andrey's answer (the highest voted at the time of writing) is slightly incorrect. I would have added this as comment but I am not reputable enough.

In Andrey's answer:

char[] chars = {'c', 'h', 'a', 'r', 's'}
byte[] bytes = Charset.forName("UTF-8").encode(CharBuffer.wrap(chars)).array();

the call to array() may not return the desired value, for example:

char[] c = "aaaaaaaaaa".toCharArray();
System.out.println(Arrays.toString(Charset.forName("UTF-8").encode(CharBuffer.wrap(c)).array()));

output:

[97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 0]

As can be seen a zero byte has been added. To avoid this use the following:

char[] c = "aaaaaaaaaa".toCharArray();
ByteBuffer bb = Charset.forName("UTF-8").encode(CharBuffer.wrap(c));
byte[] b = new byte[bb.remaining()];
bb.get(b);
System.out.println(Arrays.toString(b));

output:

[97, 97, 97, 97, 97, 97, 97, 97, 97, 97]

As the answer also alluded to using passwords it might be worth blanking out the array that backs the ByteBuffer (accessed via the array() function):

ByteBuffer bb = Charset.forName("UTF-8").encode(CharBuffer.wrap(c));
byte[] b = new byte[bb.remaining()];
bb.get(b);
blankOutByteArray(bb.array());
System.out.println(Arrays.toString(b));
Community
  • 1
  • 1
djsutho
  • 5,174
  • 1
  • 23
  • 25
  • Could the trailing \0 be implementation specific? I'm using 1.7_51 with netbeans 7.4 and not noticing any trailing \0. –  Jan 26 '14 at 04:46
  • @orthopteroid yes this example could be jvm specific. This was run with oracle 1.7.0_45 linux 64 bit (from memory). With the following implementation (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/nio/charset/CharsetEncoder.java?av=f#773) you will get errors if `averageBytesPerChar()` returns anything other than 1 (I get 1.1). Out of interest what OS/arch are you using as I double checked with oracle 1.7.0_51 and openjdk 1.7.0_51 and found it broken with 10 chars. – djsutho Jan 28 '14 at 08:36
  • @Andrey no worries. Note that `buffer.array()` in the `toBytes` function still needs to be overridden, currently only the copy is. – djsutho Jan 29 '14 at 07:59
  • @Andrey I have edited my answer to reflect the changes. – djsutho Jan 30 '14 at 09:21
  • @djsutho Today, my platform is windows7x64. Sorry, can't show the code - I'm using code like "System.arraycopy( str.getBytes( "UTF-8" ), 0, stor, 0, used );" now. –  Jan 31 '14 at 07:24
  • I guess this also assumes that arrayOffset() is 0. I do wonder whether it's okay to do so... our own code is doing the same, and I thought I'd roam around looking for a cleaner alternative. – Hakanai Apr 13 '18 at 03:54
2
private static byte[] charArrayToByteArray(char[] c_array) {
        byte[] b_array = new byte[c_array.length];
        for(int i= 0; i < c_array.length; i++) {
            b_array[i] = (byte)(0xFF & (int)c_array[i]);
        }
        return b_array;
}
Matt
  • 399
  • 3
  • 6
0

If you just want to convert the data container (the array) type itself, only regarding the data size and being agnostic to any encoding:

// original byte[]
byte[] pattern = null;
char[] arr = new char[pattern.length * 2];
ByteBuffer wrapper = ByteBuffer.wrap(pattern);
wrapper.position(0);
int i = 0;
while(wrapper.hasRemaining()) {
    char character = wrapper.remaining() < 2 ? ((char) (((int) wrapper.get()) << 8)) : wrapper.getChar();
    arr[i++] = character;
}
henry700
  • 179
  • 3
  • 8
-5

You could make a method:

public byte[] toBytes(char[] data) {
byte[] toRet = new byte[data.length];
for(int i = 0; i < toRet.length; i++) {
toRet[i] = (byte) data[i];
}
return toRet;
}

Hope this helps

Java Is Cool
  • 552
  • 2
  • 5
  • 16
  • 8
    This answer is incorrect because char data is Unicode and as such there may be up to 4 bytes per character (more are possible, but in real life, I've only found up to 4). Simply taking one byte from each character will only work for a very limited character set. Please read 'The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)' at http://www.joelonsoftware.com/articles/Unicode.html. – Ilane Oct 28 '14 at 18:47