4

Java 16, as part of incubating package jdk.incubator.foreign, used to provide convenient way to convert Java Strings to C strings of arbitrary Charset using MemorySegment CLingker.toCString​(String str, Charset charset, NativeScope scope). That method was removed since Java 17. Is there currently a convenient method to convert Java String to C string of selected Charset?

Java 18 has void MemorySegment.setUtf8String(long offset, String str). However that obviously only supports UTF8.

czerny
  • 15,090
  • 14
  • 68
  • 96
  • Decode the String manually and put the `byte[]` to the address? From the docs of `setUTF8String`: _The CharsetDecoder class should be used when more control over the decoding process is required._ – dan1st Apr 03 '22 at 20:54
  • 3
    The overload with the `Charset` was removed since it's hard to correctly support any arbitrary charset. The old implementation in Java 16 was buggy. Also, encoding is only part of the story, the other part is decoding, for which the length of the string must be determined, which is again hard to do for any arbitrary charset. See: https://github.com/openjdk/panama-foreign/pull/554#issuecomment-861596490 and related discussion. – Jorn Vernee Apr 04 '22 at 00:08
  • @JornVernee Yeah, I get that it is hard and why. OTOH, calling the win32api is best done by using the *W functions - as they always take a null-terminated UTF_16LE string. – Johannes Kuhn Apr 04 '22 at 00:45

2 Answers2

2

I use this snippet to convert strings to UTF-16:

private static MemoryAddress string(String s, ResourceScope scope) {
    if (s == null) {
        return MemoryAddress.NULL;
    }
    byte[] data = s.getBytes(StandardCharsets.UTF_16LE);
    MemorySegment seg = MemorySegment.allocateNative(data.length + 2, scope);
    seg.copyFrom(MemorySegment.ofArray(data));
    return seg.address();
}

Note that the tailing null character takes 2 bytes in UTF-16 - if you use a different encoding, you may need to modify the string before (s + '\000').

UTF-16 is good enough for my purposes - calling the Windows API.

Johannes Kuhn
  • 14,778
  • 4
  • 49
  • 73
1

On JDK18 I use a conversion of (s+"\0") which typically adds 1, 2 or 4 bytes as null termination to the end of the MemorySegment for the C string - depending on the character set used:

static MemorySegment toCString(SegmentAllocator allocator, String s, Charset charset) {
    // "==" is OK here as StandardCharsets.UTF_8 == Charset.forName("UTF8")
    if (StandardCharsets.UTF_8 == charset)
        return allocator.allocateUtf8String(s);

    return allocator.allocateArray(ValueLayout.JAVA_BYTE, (s+"\0").getBytes(charset));
}

Windows Java -> Wide string is then: toCString(allocator, s, StandardCharsets.UTF_16LE)

Hopefully someone can offer a more efficient / robust way to convert. The above works for round-trip tests I've done on a small group of character sets (Windows + WSL), but I'm not confident it is reliable in all situations.

DuncG
  • 12,137
  • 2
  • 21
  • 33