47

We try to convert from string to Byte[] using the following Java code:

String source = "0123456789";
byte[] byteArray = source.getBytes("UTF-16");

We get a byte array of length 22 bytes, we are not sure where this padding comes from. How do I get an array of length 20?

apaderno
  • 28,547
  • 16
  • 75
  • 90
mayaalpe
  • 473
  • 1
  • 4
  • 4

4 Answers4

71

Alexander's answer explains why it's there, but not how to get rid of it. You simply need to specify the endianness you want in the encoding name:

String source = "0123456789";
byte[] byteArray = source.getBytes("UTF-16LE"); // Or UTF-16BE
Community
  • 1
  • 1
Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
25

May be the first two bytes are the Byte Order Mark. It specifies the order of bytes in each 16-bit word used in the encoding.

Paŭlo Ebermann
  • 73,284
  • 20
  • 146
  • 210
Alexander
  • 9,302
  • 2
  • 26
  • 22
7

Try printing out the bytes in hex to see where the extra 2 bytes are added - are they at the start or end?

I'm picking that you'll find a byte order marker at the start (0xFEFF) - this allows anyone consuming (receiving) the byte array to recognise whether the encoding is little-endian or big-endian.

Bevan
  • 43,618
  • 10
  • 81
  • 133
7

UTF has a byte order marker at the beginning that tells that this stream is encoded in a particular format. As the other users have pointed out, the
1st byte is 0XFE
2nd byte is 0XFF
the remaining bytes are
0
48
0
49
0
50
0
51
0
52
0
53
0
54
0
55
0
56
0
57

anjanb
  • 12,999
  • 18
  • 77
  • 106