1

When I directly copy the content of my html file and store it in a string, then show it in webview using:

mWebView.loadDataWithBaseURL("file:///android_asset/", myString, "text/html", "UTF-8", null); 

everything is OK! I want to modify content of my html file (programmatically) before loading to webview, but when I read the html file from asset folder using below code

private String loadAssetTextAsString(Context context, String name) {
    BufferedReader in = null;
    try {
        StringBuilder buf = new StringBuilder();
        InputStream is = context.getAssets().open(name);
        in = new BufferedReader(new InputStreamReader(is, "UTF-8"));

        String str;
        boolean isFirst = true;
        while ( (str = in.readLine()) != null ) {
            if (isFirst)
                isFirst = false;
            else
                //buf.append('\n');
                buf.append(str);
        }
        return buf.toString();
    } catch (IOException e) {
        Log.e("TAG", "Error opening asset " + name);
    } finally {
        if (in != null) {
            try {
                in.close();
            } catch (IOException e) {
                Log.e("TAG", "Error closing asset " + name);
            }
        }
    }

    return null;
}

and then load it in webview, the webview unexpectedly shows � character (I think its name is soft hyphen). I have used UTF-8 as charset in my html file. Also I have used below code for removing � which failed.

myString = myString.replace("�", "");

How can I remove �? thanks for any help.

� character

Mr. Nobody
  • 327
  • 2
  • 8
  • 21
  • You can directly open an HTML file in a WebView. Why do you need to read it into a String? – OneCricketeer Apr 18 '16 at 20:26
  • Possible duplicate of [Loading existing .html file with android WebView](http://stackoverflow.com/questions/4027701/loading-existing-html-file-with-android-webview) – OneCricketeer Apr 18 '16 at 20:28
  • 2
    That seems to be UTF-16BE encoding, needed for the InputStreamReader. – Joop Eggen Apr 18 '16 at 20:28
  • Thank you @cricket_007 for your comment. I need to modify content of html before loading in webview. – Mr. Nobody Apr 18 '16 at 20:29
  • 1
    Try `new InputStreamReader(is, "UTF-16")` as JoopEggen pointed out – Floern Apr 18 '16 at 20:31
  • Your problem is one of character encoding. Somewhere along the way (perhaps in the WebView) the bytes are decoded using a single-byte encoding, but the characters were actually encoded using a two-byte encoding. – dsh Apr 18 '16 at 20:58
  • Thank you @Floern.It worked. I had another question that why using myString.replace("�", "") doesnot remove �? – Mr. Nobody Apr 18 '16 at 21:09
  • @iliailiaey ah, I just posted that as an answer ;) – Floern Apr 18 '16 at 21:10
  • @iliailiaey It doesn't remove the character, because your string does not actually contain that character. The encoding is simply displaying that character as "unknown" – OneCricketeer Apr 18 '16 at 21:12
  • @iliailiaey I expanded my answer about that, but it's just a guess – Floern Apr 18 '16 at 21:13

1 Answers1

1

Your content looks like it's encoded as UTF-16, where each character uses two bytes instead of one+ as in UTF-8. Simple ASCII characters are prefixed with a null byte \0 in UTF-16, which gets converted to a � when you try to display it.

Thus reading it as UTF-16 from the InputStream might solve the problem:

in = new BufferedReader(new InputStreamReader(is, "UTF-16"));

The String.replace("�", "") does not work because the � symbol as you see it is not the same as it's encoded in the String. Maybe directly replacing the null byte \0 could work, if it's preserved during the decoding as UTF-8: String.replace("\0", "").

Floern
  • 33,559
  • 24
  • 104
  • 119