My question is what is a standard reliable way to convert the “Java
modified UTF-8” to the regular UTF-8 and back?
First, consider whether you really need or want to do that. The only reason I can think of for doing so in the context of wrapping a C library is to use the JNI functions that work with Java String
s in terms of byte arrays encoded in modified UTF-8, but that's neither the only nor the best way to proceed except in rather specific circumstances.
For most cases, I would recommend going directly from UTF-8 to String objects, and getting Java to do most of that work. Simple tools Java provides for that include the constructor String(byte[], String)
, which initializes a String with data whose encoding you specify, and String.getBytes(String)
, which gives you the string's character data in the encoding of your choice. Both of these are limited to encodings known to the JVM, but UTF-8 is guaranteed to be among those. You can use those directly from your JNI code, or provide suitable for-purpose wrapper methods for your JNI code to invoke.
If you really do want the modified UTF-8 form for its own sake, then your JNI code can obtain it from the corresponding Java string (obtained as summarized above) via the GetStringUTFChars
JNI function, and you can go the other way with NewStringUTF
. Of course, this makes Java String
s the intermediate form, which is entirely appropos in this case.