43

I'm looking to convert a Java char array to a byte array without creating an intermediate String, as the char array contains a password. I've looked up a couple of methods, but they all seem to fail:

char[] password = "password".toCharArray();

byte[] passwordBytes1 = new byte[password.length*2];
ByteBuffer.wrap(passwordBytes1).asCharBuffer().put(password);

byte[] passwordBytes2 = new byte[password.length*2];
for(int i=0; i<password.length; i++) {
    passwordBytes2[2*i] = (byte) ((password[i]&0xFF00)>>8); 
    passwordBytes2[2*i+1] = (byte) (password[i]&0x00FF); 
}

String passwordAsString = new String(password);
String passwordBytes1AsString = new String(passwordBytes1);
String passwordBytes2AsString = new String(passwordBytes2);

System.out.println(passwordAsString);
System.out.println(passwordBytes1AsString);
System.out.println(passwordBytes2AsString);
assertTrue(passwordAsString.equals(passwordBytes1) || passwordAsString.equals(passwordBytes2));

The assertion always fails (and, critically, when the code is used in production, the password is rejected), yet the print statements print out password three times. Why are passwordBytes1AsString and passwordBytes2AsString different from passwordAsString, yet appear identical? Am I missing out a null terminator or something? What can I do to make the conversion and unconversion work?

Scott
  • 1,869
  • 3
  • 20
  • 25
  • Why do You want to avoid creating an intermediate String? – KarlP Feb 08 '11 at 10:53
  • 14
    Sun recommends it as best practice: http://download.oracle.com/javase/1.5.0/docs/guide/security/jce/JCERefGuide.html#PBEEx Strings are immutable, and hence can't be zeroed out like char arrays - instead, your password hangs around in memory for an indeterminate amount of time. – Scott Feb 08 '11 at 11:09

8 Answers8

17

Conversion between char and byte is character set encoding and decoding.I prefer to make it as clear as possible in code. It doesn't really mean extra code volume:

 Charset latin1Charset = Charset.forName("ISO-8859-1"); 
 charBuffer = latin1Charset.decode(ByteBuffer.wrap(byteArray)); // also decode to String
 byteBuffer = latin1Charset.encode(charBuffer);                 // also decode from String

Aside:

java.nio classes and java.io Reader/Writer classes use ByteBuffer & CharBuffer (which use byte[] and char[] as backing arrays). So often preferable if you use these classes directly. However, you can always do:

 byteArray = ByteBuffer.array();  byteBuffer = ByteBuffer.wrap(byteArray);  
 byteBuffer.get(byteArray);       charBuffer.put(charArray);
 charArray = CharBuffer.array();  charBuffer = ByteBuffer.wrap(charArray);
 charBuffer.get(charArray);       charBuffer.put(charArray);
Paul Gregoire
  • 9,715
  • 11
  • 67
  • 131
Glen Best
  • 22,769
  • 3
  • 58
  • 74
14

Original Answer

    public byte[] charsToBytes(char[] chars){
        Charset charset = Charset.forName("UTF-8");
        ByteBuffer byteBuffer = charset.encode(CharBuffer.wrap(chars));
        return Arrays.copyOf(byteBuffer.array(), byteBuffer.limit());
    }

    public char[] bytesToChars(byte[] bytes){
        Charset charset = Charset.forName("UTF-8");
        CharBuffer charBuffer = charset.decode(ByteBuffer.wrap(bytes));
        return Arrays.copyOf(charBuffer.array(), charBuffer.limit());    
    }

Edited to use StandardCharsets

public byte[] charsToBytes(char[] chars)
{
    final ByteBuffer byteBuffer = StandardCharsets.UTF_8.encode(CharBuffer.wrap(chars));
    return Arrays.copyOf(byteBuffer.array(), byteBuffer.limit());
}

public char[] bytesToChars(byte[] bytes)
{
    final CharBuffer charBuffer = StandardCharsets.UTF_8.decode(ByteBuffer.wrap(bytes));
    return Arrays.copyOf(charBuffer.array(), charBuffer.limit());    
}

Here is a JavaDoc page for StandardCharsets. Note this on the JavaDoc page:

These charsets are guaranteed to be available on every implementation of the Java platform.

DwB
  • 37,124
  • 11
  • 56
  • 82
Cassian
  • 3,648
  • 1
  • 29
  • 40
  • 1
    Nice use of ByteBuffer. However, without it being stated otherwise, the password is Unicode, so StandardCharset.UTF_8 would be better than corrupting the data by reducing it to ASCII. – Tom Blodget May 16 '17 at 12:26
  • You can use any charset you need – Cassian May 16 '17 at 19:11
  • 1
    I have edited the post changing from US-ASCII to UTF-8. You are right. The ideea is to keep same encoding. The US-ASCII does not have as many chars as UTF-8, for example - no letters with accents, and if you use first UTF-8 and after US-ASCII you loose some info. – Cassian May 16 '17 at 21:41
  • 3
    After storing sensitive data in char[] or byte[] you need to clear the sensitive data as Andrii explains in usage from here http://stackoverflow.com/a/9670279/1582089 – Cassian May 17 '17 at 15:46
  • Nice example. But in my case it works with Charset charset = Charset.forName("ISO-8859-1"); – RoutesMaps.com Jul 12 '18 at 15:03
12

The problem is your use of the String(byte[]) constructor, which uses the platform default encoding. That's almost never what you should be doing - if you pass in "UTF-16" as the character encoding to work, your tests will probably pass. Currently I suspect that passwordBytes1AsString and passwordBytes2AsString are each 16 characters long, with every other character being U+0000.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • I just tried that (i.e. `String passwordBytes1AsString = new String(passwordBytes1, "UTF-16");`) and there's no change. I also tried checking the length of the strings - `String.length()` returns 8. Would it count U+0000 characters? – Scott Feb 08 '11 at 10:33
  • @Scott: Try printing out the lengths of the strings, and the individual characters (as int values). That'll show you where the differences are. – Jon Skeet Feb 08 '11 at 10:36
  • 112,97,115,115,119,111,114,100 for both the original and the converted ones. – Scott Feb 08 '11 at 10:41
  • Have just noticed that I was using the wrong parameters to `equals()` in the assertion. \*facepalm\* Your original supposition was indeed the correct one. Many thanks. – Scott Feb 08 '11 at 10:47
5

I would do is use a loop to convert to bytes and another to conver back to char.

char[] chars = "password".toCharArray();
byte[] bytes = new byte[chars.length*2];
for(int i=0;i<chars.length;i++) {
   bytes[i*2] = (byte) (chars[i] >> 8);
   bytes[i*2+1] = (byte) chars[i];
}
char[] chars2 = new char[bytes.length/2];
for(int i=0;i<chars2.length;i++) 
   chars2[i] = (char) ((bytes[i*2] << 8) + (bytes[i*2+1] & 0xFF));
String password = new String(chars2);
Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
4

If you want to use a ByteBuffer and CharBuffer, don't do the simple .asCharBuffer(), which simply does an UTF-16 (LE or BE, depending on your system - you can set the byte-order with the order method) conversion (since the Java Strings and thus your char[] internally uses this encoding).

Use Charset.forName(charsetName), and then its encode or decode method, or the newEncoder /newDecoder.

When converting your byte[] to String, you also should indicate the encoding (and it should be the same one).

Paŭlo Ebermann
  • 73,284
  • 20
  • 146
  • 210
3

This is an extension to Peter Lawrey's answer. In order to backward (bytes-to-chars) conversion work correctly for the whole range of chars, the code should be as follows:

char[] chars = new char[bytes.length/2];
for (int i = 0; i < chars.length; i++) {
   chars[i] = (char) (((bytes[i*2] & 0xff) << 8) + (bytes[i*2+1] & 0xff));
}

We need to "unsign" bytes before using (& 0xff). Otherwise half of the all possible char values will not get back correctly. For instance, chars within [0x80..0xff] range will be affected.

Vit Khudenko
  • 28,288
  • 10
  • 63
  • 91
2

You should make use of getBytes() instead of toCharArray()

Replace the line

char[] password = "password".toCharArray();

with

byte[] password = "password".getBytes();
Baz
  • 36,440
  • 11
  • 68
  • 94
yoda
  • 21
  • 2
  • 5
    dont use `String#getBytes()` without specifying an encoding, that gets you into all kinds of portability trouble. – eckes Nov 22 '12 at 17:14
  • not appropriate to the use case : this line was just an easy way to get char[] in this example. – Cerber Apr 16 '14 at 11:54
1

When you use GetBytes From a String in Java, The return result will depend on the default encode of your computer setting.(eg: StandardCharsetsUTF-8 or StandardCharsets.ISO_8859_1etc...).

So, whenever you want to getBytes from a String Object. Make sure to give a encode . like :

String sample = "abc";
Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_8);

Let check what has happened with the code. In java, the String named sample , is stored by Unicode. every char in String stored by 2 byte.

sample :  value: "abc"   in Memory(Hex):  00 61 00 62 00 63
        a -> 00 61
        b -> 00 62
        c -> 00 63

But, When we getBytes From a String, we have

Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_8)
//result is : 61 62 63
//length: 3 bytes

Byte[] a_byte = sample .getBytes(StandardCharsets.UTF_16BE)  
//result is : 00 61 00 62 00 63        
//length: 6 bytes

In order to get the oringle byte of the String. We can just read the Memory of the string and get Each byte of the String.Below is the sample Code:

public static byte[] charArray2ByteArray(char[] chars){
    int length = chars.length;
    byte[] result = new byte[length*2+2];
    int i = 0;
    for(int j = 0 ;j<chars.length;j++){
        result[i++] = (byte)( (chars[j] & 0xFF00) >> 8 );
        result[i++] = (byte)((chars[j] & 0x00FF)) ;
    }
    return result;
}

Usages:

String sample = "abc";
//First get the chars of the String,each char has two bytes(Java).
Char[] sample_chars = sample.toCharArray();
//Get the bytes
byte[] result = charArray2ByteArray(sample_chars).

//Back to String.
//Make sure we use UTF_16BE. Because we read the memory of Unicode of  
//the String from Left to right. That's the same reading 
//sequece of  UTF-16BE.
String sample_back= new String(result , StandardCharsets.UTF_16BE);
  • The question doesn't mention `getBytes`, so this isn't really relevant. Are you trying to comment on one of the other answers? – Simon MᶜKenzie Jan 15 '16 at 03:56
  • Just want to declare that the usages of String 's getBytes Function. And what should be taking care of when using new String(Byte[]) . Hope it helps. – junqiang chen Jan 15 '16 at 06:33