Converting char[] to byte[]

Question

I would like to convert a character array to a byte array in Java. What methods exists for making this conversion?

multitask landscape · Answer 1 · 2023-08-31T21:09:04.423

Convert without creating String object:

import java.nio.CharBuffer;
import java.nio.ByteBuffer;
import java.util.Arrays;

byte[] toBytes(char[] chars) {
  CharBuffer charBuffer = CharBuffer.wrap(chars);
  ByteBuffer byteBuffer = Charset.forName("UTF-8").encode(charBuffer);
  byte[] bytes = Arrays.copyOfRange(byteBuffer.array(),
            byteBuffer.position(), byteBuffer.limit());
  Arrays.fill(byteBuffer.array(), (byte) 0); // clear sensitive data
  return bytes;
}

Usage:

char[] chars = {'0', '1', '2', '3', '4', '5', '6', '7', '8', '9'};
byte[] bytes = toBytes(chars);
/* do something with chars/bytes */
Arrays.fill(chars, '\u0000'); // clear sensitive data
Arrays.fill(bytes, (byte) 0); // clear sensitive data

Solution is inspired from Swing recommendation to store passwords in char[]. (See Why is char[] preferred over String for passwords?)

Remember not to write sensitive data to logs and ensure that JVM won't hold any references to it.

This method is needed only for security concerns. If data is not sensitive it better to use String.getBytes instead.

Here is pseudo-code (actually, Scala code) how to do the same thing manually for UTF-8:

val xs: Array[Char] = "A ß € 嗨  ".toArray
val len = xs.length
val ys: Array[Byte] = new Array(3 * len) // worst case
var i = 0; var j = 0 // i for chars; j for bytes
while (i < len) { // fill ys with bytes
  val c = xs(i)
  if (c < 0x80) {
    ys(j) = c.toByte
    i = i + 1
    j = j + 1
  } else if (c < 0x800) {
    ys(j) = (0xc0 | (c >> 6)).toByte
    ys(j + 1) = (0x80 | (c & 0x3f)).toByte
    i = i + 1
    j = j + 2
  } else if (Character.isHighSurrogate(c)) {
    if (len - i < 2) throw new Exception("overflow")
    val d = xs(i + 1)
    val uc: Int = 
      if (Character.isLowSurrogate(d)) {
        Character.toCodePoint(c, d)
      } else {
        throw new Exception("malformed")
      }
    ys(j) = (0xf0 | ((uc >> 18))).toByte
    ys(j + 1) = (0x80 | ((uc >> 12) & 0x3f)).toByte
    ys(j + 2) = (0x80 | ((uc >>  6) & 0x3f)).toByte
    ys(j + 3) = (0x80 | (uc & 0x3f)).toByte
    i = i + 2 // 2 chars
    j = j + 4
  } else if (Character.isLowSurrogate(c)) {
    throw new Exception("malformed")
  } else {
    ys(j) = (0xe0 | (c >> 12)).toByte
    ys(j + 1) = (0x80 | ((c >> 6) & 0x3f)).toByte
    ys(j + 2) = (0x80 | (c & 0x3f)).toByte
    i = i + 1
    j = j + 3
  }
}
// check
println(new String(ys, 0, j, "UTF-8"))

This code looks similar to what is in JDK[2] and Protobuf[3].

Wouldn't this create a ByteBuffer? I guess that's less costly than a String object? — Andi Jay, Jul 02 '12 at 19:41
@Andrii Nemchenko Yes you get a trailing 0 in last position if you use UTF-8 (originally I was using US-ASCII). I have refactored the code, now it works correctly with UTF-8 to. Thanks for notice! — Cassian, May 17 '17 at 15:42
@AndriiNemchenko Here 1 char takes 1 byte. Can I make it a half byte. I remember reading that 1 char occupies 4 bits. — Prabs, Aug 20 '18 at 06:23
This 'toBytes()' method has an important side effect. It wipes the input chars. charBuffer.array() actually is the input chars. Arrays.fill() would actually wipe out the input. In many cases it is OK, but sometime it creates undesired effect. — Guangliang, Oct 30 '18 at 21:23

score 86 · Accepted Answer · edited Dec 06 '22 at 15:06

86

char[] ch = ?
new String(ch).getBytes();

Or, to get non-default charset:

new String(ch).getBytes("UTF-8");

Update: Since Java 7:

new String(ch).getBytes(StandardCharsets.UTF_8);

edited Dec 06 '22 at 15:06

sazzad

5,740
6
25
42

answered Apr 01 '11 at 12:10

Tarlog

10,024
2
43
67

5

Using the platform's default charset is wrong most of the time (web apps). – maaartinus Apr 01 '11 at 12:14
5

This is a trivial solution, because of using a new String, the space needed for the operation is doubled. It won't work very well for extremely large inputs. – Levent Divilioglu May 30 '18 at 12:26
Note this is not ideal if security is an issue due to how Java caches strings. – NBJack Apr 19 '21 at 22:08
this is unsafe if char array is used to avoid string. (refer to String vs char[] in java for passwrd) – frhack Oct 27 '22 at 07:25

score 20 · Answer 3 · edited Jun 20 '20 at 09:12

20

Edit: Andrey's answer has been updated so the following no longer applies.

Andrey's answer (the highest voted at the time of writing) is slightly incorrect. I would have added this as comment but I am not reputable enough.

In Andrey's answer:

char[] chars = {'c', 'h', 'a', 'r', 's'}
byte[] bytes = Charset.forName("UTF-8").encode(CharBuffer.wrap(chars)).array();

the call to array() may not return the desired value, for example:

char[] c = "aaaaaaaaaa".toCharArray();
System.out.println(Arrays.toString(Charset.forName("UTF-8").encode(CharBuffer.wrap(c)).array()));

output:

[97, 97, 97, 97, 97, 97, 97, 97, 97, 97, 0]

As can be seen a zero byte has been added. To avoid this use the following:

char[] c = "aaaaaaaaaa".toCharArray();
ByteBuffer bb = Charset.forName("UTF-8").encode(CharBuffer.wrap(c));
byte[] b = new byte[bb.remaining()];
bb.get(b);
System.out.println(Arrays.toString(b));

output:

[97, 97, 97, 97, 97, 97, 97, 97, 97, 97]

As the answer also alluded to using passwords it might be worth blanking out the array that backs the ByteBuffer (accessed via the array() function):

ByteBuffer bb = Charset.forName("UTF-8").encode(CharBuffer.wrap(c));
byte[] b = new byte[bb.remaining()];
bb.get(b);
blankOutByteArray(bb.array());
System.out.println(Arrays.toString(b));

edited Jun 20 '20 at 09:12

Community

1
1

answered Dec 16 '13 at 06:42

djsutho

5,174
1
23
25

Could the trailing \0 be implementation specific? I'm using 1.7_51 with netbeans 7.4 and not noticing any trailing \0. – Jan 26 '14 at 04:46
@orthopteroid yes this example could be jvm specific. This was run with oracle 1.7.0_45 linux 64 bit (from memory). With the following implementation (http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/7u40-b43/java/nio/charset/CharsetEncoder.java?av=f#773) you will get errors if `averageBytesPerChar()` returns anything other than 1 (I get 1.1). Out of interest what OS/arch are you using as I double checked with oracle 1.7.0_51 and openjdk 1.7.0_51 and found it broken with 10 chars. – djsutho Jan 28 '14 at 08:36
@Andrey no worries. Note that `buffer.array()` in the `toBytes` function still needs to be overridden, currently only the copy is. – djsutho Jan 29 '14 at 07:59
@Andrey I have edited my answer to reflect the changes. – djsutho Jan 30 '14 at 09:21
@djsutho Today, my platform is windows7x64. Sorry, can't show the code - I'm using code like "System.arraycopy( str.getBytes( "UTF-8" ), 0, stor, 0, used );" now. – Jan 31 '14 at 07:24
I guess this also assumes that arrayOffset() is 0. I do wonder whether it's okay to do so... our own code is doing the same, and I thought I'd roam around looking for a cleaner alternative. – Hakanai Apr 13 '18 at 03:54

score 2 · Answer 4 · answered Apr 16 '18 at 04:45

private static byte[] charArrayToByteArray(char[] c_array) {
        byte[] b_array = new byte[c_array.length];
        for(int i= 0; i < c_array.length; i++) {
            b_array[i] = (byte)(0xFF & (int)c_array[i]);
        }
        return b_array;
}

henry700 · Answer 5 · 2021-06-14T19:19:18.527

If you just want to convert the data container (the array) type itself, only regarding the data size and being agnostic to any encoding:

// original byte[]
byte[] pattern = null;
char[] arr = new char[pattern.length * 2];
ByteBuffer wrapper = ByteBuffer.wrap(pattern);
wrapper.position(0);
int i = 0;
while(wrapper.hasRemaining()) {
    char character = wrapper.remaining() < 2 ? ((char) (((int) wrapper.get()) << 8)) : wrapper.getChar();
    arr[i++] = character;
}

score -5 · Answer 6 · answered Sep 25 '14 at 00:46

-5

You could make a method:

public byte[] toBytes(char[] data) {
byte[] toRet = new byte[data.length];
for(int i = 0; i < toRet.length; i++) {
toRet[i] = (byte) data[i];
}
return toRet;
}

Hope this helps

answered Sep 25 '14 at 00:46

Java Is Cool

552
2
5
16

8

This answer is incorrect because char data is Unicode and as such there may be up to 4 bytes per character (more are possible, but in real life, I've only found up to 4). Simply taking one byte from each character will only work for a very limited character set. Please read 'The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)' at http://www.joelonsoftware.com/articles/Unicode.html. – Ilane Oct 28 '14 at 18:47

Converting char[] to byte[]

6 Answers6

Edit: Andrey's answer has been updated so the following no longer applies.

Linked

Related