0

Java stores characters in UCS-2 format.

    byte[] bytes = {0x00, 0x48, 0x00, 0x69, 0x00, 0x2c,
                    0x60, (byte)0xA8, 0x59, 0x7D, 0x00, 0x21};
    // Print UCS-2 in hex codes
    System.out.printf("%10s", "UCS-2");
    for(int i=0; i<bytes.length; i++) {
        System.out.printf("%02x", bytes[i]);
    }

1) In the below code,

    Charset charset = Charset.forName("UTF-8");
    // Encode from UCS-2 to UTF-8
    // Create a ByteBuffer by wrapping a byte array
    ByteBuffer bb = ByteBuffer.wrap(bytes);

What is the byte order used to store bytes in bb on wrap()? BigEndian or LittleEndian?

2) In the below code,

    // Create a CharBuffer from a view of this ByteBuffer
    CharBuffer cb = bb.asCharBuffer();
    ByteBuffer bbOut = charset.encode(cb);

What is the encoding format used to store bytes of bb as characters in cb on asCharBuffer()?

overexchange
  • 15,768
  • 30
  • 152
  • 347
  • I don't understand the first question. There are no characters or strings, so where does encoding come into the picture? – shmosel Oct 31 '17 at 05:37
  • @shmosel Does `bb[0]` hold `0x21` or Does `bb[11]` hold `0x21`? – overexchange Oct 31 '17 at 05:40
  • `bb[11]` does, assuming we're talking about the same byte array. – shmosel Oct 31 '17 at 05:42
  • @shmosel Sorry I was asking about byte order – overexchange Oct 31 '17 at 05:42
  • You're talking about little vs. big endian? – shmosel Oct 31 '17 at 05:43
  • @shmosel Yes, MSB position and LSB position of bits stored in each byte of `ByteBuffer` compared to `byte` datatype – overexchange Oct 31 '17 at 05:45
  • You're wrapping an existing byte array. The buffer's endianness won't have any effect until you start writing to it. – shmosel Oct 31 '17 at 05:48
  • *Java stores characters in UCS-2 format.* This is false. Until Java 9, strings were stored as simple `char[]` arrays. Since Java 9, they're encoded in one of two byte encodings, neither of which are UCS-2. – shmosel Oct 31 '17 at 05:48
  • @shmosel [Answer](https://stackoverflow.com/a/36236799/3317808) says, java used UCS-2, UCS-2 exhausts after 65535 code point. Now they use UTF-16 covering extra planes with backward compatibility of UCS-2 – overexchange Oct 31 '17 at 07:34
  • Ok, I understand what you're saying now. But I still don't see your point. – shmosel Oct 31 '17 at 08:08
  • Your second question is answered [here](https://stackoverflow.com/questions/6750123/what-charset-does-bytebuffer-ascharbuffer-use). – shmosel Oct 31 '17 at 08:09
  • @shmosel So, Can I say, for second question, `CharBuffer cb = bb.asCharBuffer()` is equivalent to saying `Charset cset = Charset.forName("UTF-16");CharBuffer cb = cset.decode(bb)`? – overexchange Oct 31 '17 at 16:43
  • It would seem so. – shmosel Oct 31 '17 at 17:15

0 Answers0