Why does byteArray have a length of 22 instead of 20?

Question

We try to convert from string to Byte[] using the following Java code:

String source = "0123456789";
byte[] byteArray = source.getBytes("UTF-16");

We get a byte array of length 22 bytes, we are not sure where this padding comes from. How do I get an array of length 20?

score 71 · Accepted Answer · edited May 23 '17 at 10:31

71

Alexander's answer explains why it's there, but not how to get rid of it. You simply need to specify the endianness you want in the encoding name:

String source = "0123456789";
byte[] byteArray = source.getBytes("UTF-16LE"); // Or UTF-16BE

edited May 23 '17 at 10:31

Community

1
1

answered Oct 23 '08 at 08:53

Jon Skeet

1,421,763
867
9,128
9,194

score 25 · Answer 2 · edited May 18 '17 at 18:48

25

May be the first two bytes are the Byte Order Mark. It specifies the order of bytes in each 16-bit word used in the encoding.

edited May 18 '17 at 18:48

Paŭlo Ebermann

73,284
20
146
210

answered Oct 23 '08 at 08:50

Alexander

9,302
2
26
22

score 7 · Answer 3 · answered Oct 23 '08 at 08:52

Try printing out the bytes in hex to see where the extra 2 bytes are added - are they at the start or end?

I'm picking that you'll find a byte order marker at the start (0xFEFF) - this allows anyone consuming (receiving) the byte array to recognise whether the encoding is little-endian or big-endian.

score 7 · Answer 4 · answered Oct 23 '08 at 08:59

UTF has a byte order marker at the beginning that tells that this stream is encoded in a particular format. As the other users have pointed out, the
1st byte is 0XFE
2nd byte is 0XFF
the remaining bytes are
0
48
0
49
0
50
0
51
0
52
0
53
0
54
0
55
0
56
0
57

Why does byteArray have a length of 22 instead of 20?

4 Answers4

Linked

Related