1

I have a Java string literal with Unicode characters that needs to be transferred into a C string literal, that is loadable with JNIEnv.NewString.

Unfortunately, the above method takes a pointer to an array of unsigned short (jchar). I have tried using code like the following:

unsigned short str[] = {65, 66, 67};
jstring java_str = (*env)->NewString(env, str, 3);

However, this takes a lot of room, is not human readable, and is difficult to maintain.

Is there a way to convert a string literal into a unsigned short[] in C, whilst still being able to use Java's UTF-16 characters?

Can this escaping be done programatically? i.e. convert a java.lang.String into a string literal that would work in C source code.

konsolas
  • 1,041
  • 11
  • 24

2 Answers2

2

If you can use C11, and GCC, you can use the new char16_t that will be UTF-16 in GCC:

#include <uchar.h>

#ifndef __STDC_UTF_16__
#error "char16_t not UTF-16"
#endif

...
    char16_t my_string[] = u"abc";
    jstring java_str = (*env)->NewString(env, str, 3);

And compile with gcc -std=c11

But anyway, most of the time one just uses ASCII strings and for that one can simply use the

jstring java_str = (*env)->NewStringUTF(env, "abc");

which will assume that the string is in the modified UTF-8 encoding (i.e. UTF-16 surrogate pairs are encoded separately into UTF-8; and null-terminated). As ASCII is a subset of UTF-8, this is rather usable for ASCII strings.

  • This is an appropriate and expedient use of `NewStringUTF` because the strings are literal strings in source code and it can be known that the compiler is told the correct source character set and the execution character set can be selected to be compatible with modified UTF-8 for certain ranges of codepoints (including U+0000 to D+D7FF). A source code comment to that effect is advisable. The set of applicable character sets is even larger if your data is limited to the C0 Controls and Basic Latin (U+0000 to U+007F). – Tom Blodget Feb 26 '17 at 16:37
  • This C11 string literal format seems to be what I was looking for. Thanks! – konsolas Feb 26 '17 at 19:19
1

What you are looking for is not called escaping.

It appears that what you want to do is to specify a character string in C, using a human-readable string literal, and to be able to pass this to JNI NewString().

You are going to have to read up on wchar_t.

See What is a "wide character string" in C language? and https://en.wikibooks.org/wiki/C_Programming/C_Reference/wchar.h

What you will need to do is define your string literals as wchar_t (using the "L" notation explained in the above posts) and then write a conversion function which converts these arrays of wchar_t to arrays of jchar.

Unfortunately, the C standard does not define the precise implementation of wchar_t, and instead leaves it up to C compiler vendors to do as they please, so there is a chance that your C compiler does not treat wchar_t as a 16-bit quantity. In this case, your conversion function will not be able to simply cast an array of wchar_t to an array of jchar, and it will have to convert them one by one instead. It is a bit of a hassle, but doable. Good luck!

Community
  • 1
  • 1
Mike Nakis
  • 56,297
  • 11
  • 110
  • 142