3

How can I read NUL-terminated UTF-8 string from Java ByteBuffer starting at ByteBuffer#position()?

ByteBuffer b = /* 61 62 63 64 00 31 32 34 00 (hex) */;
String s0 = /* read first string */;
String s1 = /* read second string */;

// `s0` will now contain “ABCD” and `s1` will contain “124”.

I have already tried using Charsets.UTF_8.decode(b) but it seems this function is ignoring current ByteBuffer postision and reads until the end of the buffer.

Is there more idiomatic way to read such string from byte buffer than seeking for byte containing 0 and the limiting the buffer to it (or copying the part with string into separate buffer)?

jiwopene
  • 3,077
  • 17
  • 30
  • 3
    This is not a code-writing service. Post your own efforts and tell us what problems you're having. – Andrew Henle Aug 25 '20 at 11:02
  • For lower level functionality I'd look into `CharsetDecoder`... – Maarten Bodewes Aug 25 '20 at 11:04
  • 3
    Err, read characters untlil you get the NUL? Unclear what the problem is here. – user207421 Aug 25 '20 at 11:13
  • Have you tried reading byte by byte in a loop and populating a `byte[]` and then instantiate a `new String( bytes[], StandardCharsets.UTF_8 );`? – Jim Aug 25 '20 at 12:24
  • @Jim, Yes but I think it is unnecessarily complicated since it is possible (at least in theory) to use the original buffer. – jiwopene Aug 25 '20 at 13:09
  • @jiwopene: I don't really get your point. You do have multiple strings to decode right? So at some point the code needs to loop – Jim Aug 25 '20 at 13:13
  • I have input containing integers (in binary format), individual characters, and NUL-terminated strings in known order. I want to deserialize the data. – jiwopene Aug 25 '20 at 16:42

3 Answers3

6

Idiomatic meaning "one liner" not that I know of (unsurprising since NUL-terminated strings are not part of the Java spec).

The first thing I came up with is using b.slice().limit(x) to create a lightweight view onto the desired bytes only (better than copying them anywhere as you might be able to work directly with the buffer)

ByteBuffer b = ByteBuffer.wrap(new byte[] {0x61, 0x62, 0x63, 0x64, 0x00, 0x31, 0x32, 0x34, 0x00 });
int i;
while (b.hasRemaining()) {
  ByteBuffer nextString = b.slice(); // View on b with same start position
  for (i = 0; b.hasRemaining() && b.get() != 0x00; i++) {
    // Count to next NUL
  }
  nextString.limit(i); // view now stops before NUL
  CharBuffer s = StandardCharsets.UTF_8.decode(nextString);
  System.out.println(s);
}
drekbour
  • 2,895
  • 18
  • 28
1

In java the char \u0000, the UTF-8 byte 0, the Unicode code point U+0 is a normal char. So read all (maybe into an overlarge byte array), and do

String s = new String(bytes, StandardCharsets.UTF_8);

String[] s0s1 = s.split("\u0000");
String s0 = s0s1[0];
String s1 = s0s1[1];

If you do not have fixed positions and must sequentially read every byte the code is ugly. One of the C founders indeed called the nul terminated string a historic mistake.

The reverse, to not produce a UTF-8 byte 0 for a java String, normally for further processing as C/C++ nul terminated strings, there exists writing a modified UTF-8, also encoding the 0 byte.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
0

You can do it by replace and split functions. Convert your hex bytes to String and find 0 by a custom character. Then split your string with that custom character.

import java.nio.ByteBuffer;
import java.nio.charset.StandardCharsets;
import java.util.Arrays;

/**
 * Created by Administrator on 8/25/2020.
 */
public class Jtest {
    public static void main(String[] args) {
        //ByteBuffer b = /* 61 62 63 64 00 31 32 34 00 (hex) */;
        ByteBuffer b = ByteBuffer.allocate(10);

        b.put((byte)0x61);
        b.put((byte)0x62);
        b.put((byte)0x63);
        b.put((byte)0x64);
        b.put((byte)0x00);
        b.put((byte)0x31);
        b.put((byte)0x32);
        b.put((byte)0x34);
        b.put((byte)0x00);
        b.rewind();

        String s0;
        String s1;

        // print the ByteBuffer
        System.out.println("Original ByteBuffer:  "
                + Arrays.toString(b.array()));

        // `s0` will now contain “ABCD” and `s1` will contain “124”.
        String s = StandardCharsets.UTF_8.decode(b).toString();
        String ss = s.replace((char)0,';');
        String[] words = ss.split(";");
        for(int i=0; i < words.length; i++) {
            System.out.println(" Word " + i + " = " +words[i]);
        }

    }
}

I believe you can do it more efficiently with removing replace.

Majid Hajibaba
  • 3,105
  • 6
  • 23
  • 55