2

I'm working with a project that uses the JNI to integrate c++ backend libs in an Android project.

I'm working with the SpannableString class and calling the setSpan() functions directly from C++ to style without having to re-run code in Java on the front-end side.

I've got a string that has tags (like html) that are as follows:

"This is a test {is} italics {ie} that we are using to demonstrate app functionality..." obviously with more text.

I'm looping through all of the string chars in c++ and saving their location / then adding the italic span to my SpannableString. Here's what my loop looks like:

        int its = 0, fns = 0, crfs = 0;
    for(int i = 0; i < buff.size(); ++i){

        if(buff[i] != '{')
            continue;

        string tmp = "";

        for(int a = i + 1; a < i + 5; ++a){
            if(buff[a] == '}')
                break;
            tmp += buff[a];
        }

        if(tmp == "is"){
            its = i;
            //start of italics.
        } else if(tmp == "ie"){
            //end of italic tag.
            if(its == 0)
                continue;

            set_spannable(spannable, new_style_obj(italic), its + 4, i);
            its = 0;
        } else if(tmp == "fn"){
            //new footnote tag.
        } else if(tmp == "cf"){
            //new cross reference tag.
        }

    }

The code compiles and runs perfectly, but the position of the italics doesn't correlate directly with the position in the java string. For some reason it continues to increment until my italics are no where near where they're supposed to be.

I've run the same loop in java and it works perfectly:

        String buff = sp.toString();
    int its = 0;
    for(int i = 0; i < buff.length(); ++i){

        if(buff.charAt(i) != '{')
            continue;

        String tmp = "";

        for(int a = i + 1; a < i + 5; ++a){
            if(buff.charAt(a) == '}')
                break;
            tmp += buff.charAt(a);
        }

        System.out.println(tmp);

        if(tmp.equals("is")){
            its = i;
        } else if(tmp.equals("ie")){
            if(its == 0)
                continue;
            System.out.println("Span from " + its + " to " + i);// + //buff.substr(its + 4, (i-4) -its) + "))";
            sp.setSpan(new StyleSpan(1), its + 4, i, 1);


            // set_spannable(spannable, new_style_obj(italic), its, i);
            its = 0;
        }

    }
    tv.setText(sp);

The interesting thing is that the string length in java is always greater than the string length in c++.

I've tested it using strlen(string.c_str()) and string.size(); both of which don't return the same length as the java call string.length.

Anyone know what's causing this discrepancy and how to fix it? Are there characters that are being read in java and not in c++?

Thanks for your help in advance!

Update 1: Here's the tag location data >>

C++
    Span from 26 to 36
     Span from 146 to 152
     Span from 466 to 473
     Span from 1438 to 1445
     Span from 1726 to 1733
     Span from 1913 to 1920
     Span from 2157 to 2167
     Span from 2228 to 2239
     Span from 2289 to 2299
     Span from 2544 to 2555
     Span from 2827 to 2834
     Span from 2965 to 2972
     Span from 3293 to 3300
     Span from 3913 to 3920
     Span from 4016 to 4023
     Span from 4378 to 4385
     Span from 4467 to 4474
     Span from 5179 to 5195
     Span from 5337 to 5344

Java
    Span from 26 to 36
    Span from 146 to 152
    Span from 462 to 469
    Span from 1426 to 1433
    Span from 1710 to 1717
    Span from 1897 to 1904
    Span from 2139 to 2149
    Span from 2208 to 2219
    Span from 2269 to 2279
    Span from 2520 to 2531
    Span from 2803 to 2810
    Span from 2939 to 2946
    Span from 3265 to 3272
    Span from 3877 to 3884
    Span from 3980 to 3987
    Span from 4340 to 4347
    Span from 4427 to 4434
    Span from 5129 to 5145
    Span from 5285 to 5292
Seth
  • 180
  • 8
  • 2
    c_str() will count a string as a C-string, so stopping at the first \0. size() would return the proper C++ size. Are the C++ and JNI strings consistent? (display the same) – Matthieu Brucher Nov 12 '18 at 14:48
  • The strings are identical as I'm setting the java string using the c++ string when I create the SpannableString object in c++ (which is then returned through the JNI to Java). – Seth Nov 12 '18 at 14:51
  • Can you give some examples? And if the strings are identical, which one is wrong between C++ and JNI? – Matthieu Brucher Nov 12 '18 at 14:57
  • 2
    could you check the encoding of the string in C++? The strings are stored as UTF-16 in java unless you tell it otherwise. So maybe the issue is that your C++ string takes a single byte per char while in java it takes two. – Adham Zahran Nov 12 '18 at 15:00
  • 4
    As a quick test try to use std::wstring in C++ and see if it fixes your issue. – Adham Zahran Nov 12 '18 at 15:00
  • @AdamZahran I tried using wstring, same results. Plus, the JNI has no method to handle wchar_t * data as it does for char * data. In other words I end up having to convert the wchars back to chars before calling the env->NewStringUTF(chars) method. Matthieu, I'll post more examples, but for now I've updated the question with the tag location data from both c++ and java.. Thanks everyone!! – Seth Nov 12 '18 at 16:13
  • 1
    @AdamZahran On further testing, the u16string fixed my issues! Thanks for your answer. If you post it, I'll mark it correctly. – Seth Nov 12 '18 at 20:36

1 Answers1

0

The updated tables are consistent with the hypothesis that once in a while your string contains non-ascii characters which, in C++, are represented with more than one char.

There is no easy fix for this. The minimalistic approach would be, when counting, to skip all bytes that have utf8 continuation bits (the top two bits of 10), i.e. bytes between 0x80 and 0xbf.

Alternatively, you can work with two-byte UCS-16 strings, which we get from JNI function GetStringChars(), which exactly reproduces the Java representation (i.e. the lengths will be identical in Java and C++), but has poor library support (no functions or std::string to help).

Or you can convert your utf8 string to wchar_t with codecvt_utf8()

Alex Cohn
  • 56,089
  • 9
  • 113
  • 307