2

I have an arbitrary chunk of bytes that represent chars, encoded in an arbitrary scheme (may be ASCII, UTF-8, UTF-16). I know the encoding.

What I'm trying to do is find the location of the last new line (\n) in the array of bytes. I want to know how many bytes are left over after reading the last encoded \n.

I can't find anything in the JDK or any other library that will let me convert a byte array to chars one by one. InputStreamReader reads the stream in chunks, not giving me any indication how many bytes are getting read to produce a char.

Am I going to have to do something as horrible are re-encoding each char to figure out its byte length?

Andrew Thompson
  • 168,117
  • 40
  • 217
  • 433
Kong
  • 8,792
  • 15
  • 68
  • 98
  • how about this http://stackoverflow.com/questions/4931854/converting-char-array-into-byte-array-and-back-again – DevZer0 Jun 21 '13 at 03:45
  • I saw that one, but it doesn't seem to answer my question (they are simply using 2 bytes per char) – Kong Jun 21 '13 at 03:54
  • u can use a loop and a map then, map of all possible chars in the supported encoding formats – DevZer0 Jun 21 '13 at 03:55

1 Answers1

4

You can try something like this

    CharsetDecoder cd = Charset.forName("UTF-8").newDecoder();
    ByteBuffer in = ByteBuffer.wrap(bytes);
    CharBuffer out = CharBuffer.allocate(1);
    int p = 0;
    while (in.hasRemaining()) {
        cd.decode(in, out, true);
        char c = out.array()[0];
        int nBytes = in.position() - p;
        p = in.position();
        out.position(0);
    }
Evgeniy Dorofeev
  • 133,369
  • 30
  • 199
  • 275
  • This works perfectly thanks! Setting the limit on the CharBuffer to 1 also works to achieve "char at a time" and lets you keep the converted chars. – Kong Jun 21 '13 at 04:24
  • 2
    A significant bug was fixed in Java 8 relative to this. If in the above situation there are decode errors, Java7 was returning OVERFLOW from the decode call, even when there was a decode error and the decoder was set to REPLACE errors. Java8 fixes this. – Mike Beckerle May 22 '15 at 19:35