13

Let's say that on the C++ side my function takes a variable of type jstring named myString. I can convert it to an ANSI string as follows:

const char* ansiString = env->GetStringUTFChars(myString, 0);

is there a way of getting

const wchar_t* unicodeString = ...

piyushj
  • 1,546
  • 5
  • 21
  • 29

10 Answers10

13

If this helps someone... I've used this function for an Android project:

std::wstring Java_To_WStr(JNIEnv *env, jstring string)
{
    std::wstring value;

    const jchar *raw = env->GetStringChars(string, 0);
    jsize len = env->GetStringLength(string);
    const jchar *temp = raw;
    while (len > 0)
    {
        value += *(temp++);
        len--;
    }
    env->ReleaseStringChars(string, raw);

    return value;
}

An improved solution could be (Thanks for the feedback):

std::wstring Java_To_WStr(JNIEnv *env, jstring string)
{
    std::wstring value;

    const jchar *raw = env->GetStringChars(string, 0);
    jsize len = env->GetStringLength(string);

    value.assign(raw, raw + len);

    env->ReleaseStringChars(string, raw);

    return value;
}
Antonio
  • 19,451
  • 13
  • 99
  • 197
gergonzalez
  • 2,488
  • 1
  • 22
  • 21
  • Neat, though I suspect loading the wstring with a buffer in one go would be more efficient that one character at a time. – Rup Feb 02 '12 at 09:21
  • 1
    Does the C++ compiler notice that you are returning an automatic, and allocate it on the heap and not the stack? – Stevens Miller Jul 19 '16 at 20:50
4

And who frees wsz? I would recommend STL!

std::wstring JavaToWSZ(JNIEnv* env, jstring string)
{
    std::wstring value;
    if (string == NULL) {
        return value; // empty string
    }
    const jchar* raw = env->GetStringChars(string, NULL);
    if (raw != NULL) {
        jsize len = env->GetStringLength(string);
        value.assign(raw, len);
        env->ReleaseStringChars(string, raw);
    }
    return value;
}
  • Not a great solution unless using C++11 since the wstring will be returned by value. (Obviously post C++11 it'll be move constructed which would be efficient) – Benj Jan 10 '12 at 09:13
  • 4
    value.assign(raw, len); is not valid. I think it should be value.assign(raw, raw + len); but I haven't tested yet. – mjaggard Apr 17 '12 at 09:17
  • Great - worked for me perfectly in a C# -> C++/CLI -> JNI -> Java application! – bbqchickenrobot Aug 29 '12 at 21:29
  • Don't you have to call ReleaseStringChars regardless of success of GetStringChars otherwise the jstring may be pinned and 'leak' – Greg Domjan May 29 '15 at 01:51
4

JNI has a GetStringChars() function as well. The return type is const jchar*, jchar is 16-bit on win32 so in a way that would be compatible with wchar_t. Not sure if it's real UTF-16 or something else...

Adam Mitz
  • 6,025
  • 1
  • 29
  • 28
3

I know this was asked a year ago, but I don't like the other answers so I'm going to answer anyway. Here's how we do it in our source:

wchar_t * JavaToWSZ(JNIEnv* env, jstring string)
{
    if (string == NULL)
        return NULL;
    int len = env->GetStringLength(string);
    const jchar* raw = env->GetStringChars(string, NULL);
    if (raw == NULL)
        return NULL;

    wchar_t* wsz = new wchar_t[len+1];
    memcpy(wsz, raw, len*2);
    wsz[len] = 0;

    env->ReleaseStringChars(string, raw);

    return wsz;
}

EDIT: This solution works well on platforms where wchar_t is 2 bytes, some platforms have a 4 byte wchar_t in which case this solution will not work.

Benj
  • 31,668
  • 17
  • 78
  • 127
  • 2
    This solution is wrong. I sucked 12 hours because of that. wchar_t and jchar are not necessary the same. The proof for that is the output of my test program: `01-26 20:28:43.675: E/[LMI-NATIVE](9280): len: 7, jchar: 2, wchar: 4` – Kobor42 Jan 26 '12 at 19:32
  • 2
    @Kobor42 - What does your test program do? Are you saying that you found an instance where wchar_t was 4 bytes? I didn't actually realise it but this function was designed to run (primarily) on Windows where wchar_t is always 2. I now realise wchar_t is compiler specific and may be different on your platform. – Benj Jan 26 '12 at 21:54
  • Exactly. On Android prior 2.1 wchar_t is 1 byte. 2.1 and after is 4 bytes. – Kobor42 May 18 '12 at 09:16
  • You're mixing potentially incompatible types. A Java `jchar` is always UTF-16. But `wchar_t` is not always UTF-16, sometimes it is UTF-32. In such cases you need to convert UTF-16 to UTF-32 (it's NOT just a matter of padding jchar to 4 bytes, see http://en.wikipedia.org/wiki/UTF-16 for details). – rustyx May 26 '12 at 20:27
  • I'm not mixing it. NDK is mixing it. I would like to convert java strings wihout information loss to c strings. – Kobor42 Jul 12 '12 at 08:54
  • @Benj - Why it does not work on platform with wchar_t as 4 bytes ? – Ayush Pant May 26 '22 at 20:44
3

A portable and robust solution is to use iconv, with the understanding that you have to know what encoding your system wchar_t uses (UTF-16 on Windows, UTF-32 on many Unix systems, for example).

If you want to minimise your dependency on third-party code, you can also hand-roll your own UTF-8 converter. This is easy if converting to UTF-32, somewhat harder with UTF-16 because you have to handle surrogate pairs too. :-P Also, you must be careful to reject non-shortest forms, or it can open up security bugs in some cases.

C. K. Young
  • 219,335
  • 46
  • 382
  • 435
  • You're suggesting converting the jstring to UTF-8 then back to UTF-16? Is that really necessary? – Rup Jan 05 '12 at 13:01
  • @Rup jstrings already are UTF-8: "The JNI uses modified UTF-8 strings to represent various string types. Modified UTF-8 strings are the same as those used by the Java VM. Modified UTF-8 strings are encoded so that character sequences that contain only non-null ASCII characters can be represented using only one byte per character, but all Unicode characters can be represented.....The Java VM does not recognize the four-byte format of standard UTF-8; it uses its own two-times-three-byte format instead." – arkon May 23 '12 at 17:36
  • @b1naryatr0phy Really? jni.h on my system (both 1.6 and 1.7) has `typedef unsigned short jchar;` which looks more like UTF-16 to me. – Rup May 24 '12 at 00:08
  • I must be misunderstanding something then, that quote was pulled directly from Oracle's documentation: [http://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html](http://docs.oracle.com/javase/1.5.0/docs/guide/jni/spec/types.html) Feel free to explain if you can, I'm still trying to wrap my head around this. – arkon May 24 '12 at 00:29
1

If we are not interested in cross platform-ability, in windows you can use the MultiByteToWideChar function, or the helpful macros A2W (ref. example).

1800 INFORMATION
  • 131,367
  • 29
  • 160
  • 239
0

Rather simple. But do not forget to free the memory by ReleaseStringChars

JNIEXPORT jboolean JNICALL Java_TestClass_test(JNIEnv * env, jobject, jstring string)
{
    const wchar_t * utf16 = (wchar_t *)env->GetStringChars(string, NULL);
    ...
    env->ReleaseStringChars(string, utf16);
}
Vladimir Ivanov
  • 42,730
  • 18
  • 77
  • 103
0

I try to jstring->char->wchar_t

char* js2c(JNIEnv* env, jstring jstr)
{
    char* rtn = NULL;
    jclass clsstring = env->FindClass("java/lang/String");
    jstring strencode = env->NewStringUTF("utf-8");
    jmethodID mid = env->GetMethodID(clsstring, "getBytes", "(Ljava/lang/String;)[B");
    jbyteArray barr = (jbyteArray)env->CallObjectMethod(jstr, mid, strencode);
    jsize alen = env->GetArrayLength(barr);
    jbyte* ba = env->GetByteArrayElements(barr, JNI_FALSE);
    if (alen > 0)
    {
        rtn = (char*)malloc(alen + 1);
        memcpy(rtn, ba, alen);
        rtn[alen] = 0;
    }
    env->ReleaseByteArrayElements(barr, ba, 0);
    return rtn;
}

jstring c2js(JNIEnv* env, const char* str) {
    jstring rtn = 0;
    int slen = strlen(str);
    unsigned short * buffer = 0;
    if (slen == 0)
        rtn = (env)->NewStringUTF(str);
    else {
        int length = MultiByteToWideChar(CP_ACP, 0, (LPCSTR)str, slen, NULL, 0);
        buffer = (unsigned short *)malloc(length * 2 + 1);
        if (MultiByteToWideChar(CP_ACP, 0, (LPCSTR)str, slen, (LPWSTR)buffer, length) > 0)
            rtn = (env)->NewString((jchar*)buffer, length);
        free(buffer);
    }
    return rtn;
}



jstring w2js(JNIEnv *env, wchar_t *src)
{
    size_t len = wcslen(src) + 1;
    size_t converted = 0;
    char *dest;
    dest = (char*)malloc(len * sizeof(char));
    wcstombs_s(&converted, dest, len, src, _TRUNCATE);

    jstring dst = c2js(env, dest);
    return dst;
}

wchar_t *js2w(JNIEnv *env, jstring src) {

    char *dest = js2c(env, src);
    size_t len = strlen(dest) + 1;
    size_t converted = 0;
    wchar_t *dst;
    dst = (wchar_t*)malloc(len * sizeof(wchar_t));
    mbstowcs_s(&converted, dst, len, dest, _TRUNCATE);
    return dst;
}
0

Here is how I converted jstring to LPWSTR.

const char* nativeString = env->GetStringUTFChars(javaString, 0);
size_t size = strlen(nativeString) + 1;
LPWSTR lpwstr = new wchar_t[size];
size_t outSize;
mbstowcs_s(&outSize, lpwstr, size, nativeString, size - 1);
Eng.Fouad
  • 115,165
  • 71
  • 313
  • 417
0

Just use env->GetStringChars(myString, 0); Java pass Unicode by it's nature