4

What Charset does ByteBuffer.asCharBuffer() use? It seems to convert 3 bytes to one character on my system.

On a related note, how does CharsetDecoder relate to ByteBuffer.asCharBuffer()?

UPDATE: With respect to what implementation of ByteBuffer I am using, I am invoking ByteBuffer.allocate(1024).asCharBuffer(). I can't comment on what implementation gets used under the hood.

Gili
  • 86,244
  • 97
  • 390
  • 689
  • 1
    `ByteBuffer` is an abstract class and `asCharBuffer()` is abstract as well. What sublcass of `ByteBuffer` are you using? – Paul Jul 19 '11 at 15:58
  • @Paul Where did you find another implementation of ByteBuffer but HeapByteBuffer in the SDK? There's only that one as far as I can see. – Voo Jul 19 '11 at 16:07
  • I suppose, @Voo, since I also happen to know that he's not using any external libraries that should be obvious to me. Oh, wait, I have no idea what he's using in his code, which is why I asked. – Paul Jul 19 '11 at 16:18
  • @Voo, `DirectByteBuffer` also extends `ByteBuffer` through `MappedByteBuffer`. – Paul Jul 19 '11 at 16:36
  • Do this: `CharBuffer buff = ByteBuffer.allocate(1024).asCharBuffer(); System.out.println("buff class is = " + buff.getClass().getName());` – Paul Jul 19 '11 at 16:41
  • @Paul Oh you're right I missed the DirectByteBuffer. For allocate() there's no problem - it uses HeapByteBuffer (and as long as it is aligned DirectByteBuffer uses the same stuff under the hood, it only differs for unaligned accesses) – Voo Jul 19 '11 at 17:40

3 Answers3

3

For the first question - I believe it uses native character encoding of Java (UTF-16).

Petteri H
  • 11,779
  • 12
  • 64
  • 94
2

As I understand it, it doesn't use anything. It just assumes it is already correctly decoded as a string for Java, which means UTF-16. This can be shown by looking at the source for the HeapByteBuffer, where the returned charbuffer finally calls (little endian version):

static private char makeChar(byte b1, byte b0) {
return (char)((b1 << 8) | (b0 & 0xff));
}

So the only thing that is handled here is the endianness for the rest you're responsible. Which also means it's usually much more useful to use the Decoder class where you can specify the encoding.

Voo
  • 29,040
  • 11
  • 82
  • 156
  • I am accepting your answer because you quoted the actual source-code. – Gili Jul 19 '11 at 16:20
  • What package is `HeapByteBuffer` in? I don't see it in the JDK 6 API. I see `HeadlessException` followed by `HexBinaryAdapter`. – Paul Jul 19 '11 at 16:21
  • I've unaccepted the answer because the source-code has changed in JDK 7. Paul, you will find the relevant files in openjdk\jdk\src\share\classes\java\nio. See `Heap-X-Buffer.java.template` and `ByteBufferAs-X-Buffer.java.template` – Gili Jul 19 '11 at 16:28
  • @Gili, how does this explain the 3 bytes -> 1 character conversion you're seeing? – Paul Jul 19 '11 at 16:29
  • @Gili, the reason I didn't see it is it's package-private. It's in package `java.nio`. Same with `DirectByteBuffer`, which also extends `ByteBuffer`. – Paul Jul 19 '11 at 16:38
  • Yep it goes something like this: `ByteBuffer->HeapByteBuffer(real implementation returned from allocate())->ByteBufferAsCharBufferB/L(returned from asCharBuffer)`. And that then implements the gets as `Bits.getCharB/L which in turn calls makeChar()`. Possible that this changed in Java7 (but I'd assume it still had the same functionality; maybe some different wrapper classes in between). But I don't see any way how this would interpret a char as 3bytes - Java uses UTF-16 internally and there's no decoding anywhere in sight. – Voo Jul 19 '11 at 17:36
0

Looking at jdk7, jdk/src/share/classes/java/nio

  1. X-Buffer.java.template maps ByteBuffer.allocate() to Heap-X-Buffer.java.template
  2. Heap-X-Buffer.java.template maps ByteBuffer.asCharBuffer() to ByteBufferAs-X-Buffer.java.template
  3. ByteBuffer.asCharBuffer().toString() invokes CharBuffer.put(CharBuffer) but I can't figure out where this leads

Eventually this probably leads to Bits.makeChar() which is defined as:

static private char makeChar(byte b1, byte b0) {
    return (char)((b1 << 8) | (b0 & 0xff));
}

but I can't figure out how.

Gili
  • 86,244
  • 97
  • 390
  • 689