4

I have garbled text è¼å¥ which is returned by web service (php) fetched from MySql

Now I am trying to decode it to utf-8 in android, but it's not working

I have tried:

String s = "è¼å¥";// text returned by web service taking it as static for testing

1. not working:

String str = new String(s.getBytes(), "utf-8");

2. not working:

String normalized = Normalizer.normalize(str, Normalizer.Form.NFD);
// also tried NFC, NFKC, NFKD
// also tested by isNormalized its returning true 

3. not working:

String str =URLDecoder.decode(s, "utf-8");

all above are giving same output: è¼å¥

So, please can anyone help me understand what I am doing wrong? Or please provide me any alternative?

Any help will be much appreciated. Thanks

Tiffany
  • 680
  • 1
  • 15
  • 31
Tarsem Singh
  • 14,139
  • 7
  • 51
  • 71

3 Answers3

8

As Stephen C explained very well i followed all that steps, but little additional changes are required :

1. As Stephen C explained my server was sending data in Latin-1 encoding so i have to use ISO8859_1 charset

2. i was trying String str = new String(s.getBytes(), "utf-8");

this will not work for Latin-1 encoded data !

so for this i have to set the charset (for my case ISO8859_1) of data to getBytes(" ISO8859_1")

so this is working fine now

String str = new String(s.getBytes("ISO-8859-1"), "utf-8");

Note second parameter is for charset of new string so it must be utf-8 to display the original text

Tarsem Singh
  • 14,139
  • 7
  • 51
  • 71
1

First thing to do is to check the response Content-Type header to see what encoding that the remote server says it is using in the response. If it says nothing, then the chances are that it is using ISO-8859-1 (aka Latin-1) and not UTF-8.

Another possibility is that the server is sending binary data ... and you shouldn't be trying to display it as text at all.

It would help if you told us what you were expecting the text to look like.


Assuming that it is latin-1 text then you need to decode it like this:

String str = new String(s.getBytes(), "ISO8859_1");

Note that what you are actually trying to do here is to convert from the byte encoding to Java's native String representation in which the characters are effectively represented in UTF-16.


I also note that you say that the original text is supposed to be Chinese characters. If that is the case, then I'm afraid that the real problem is on the server end. Latin-1 is not a valid encoding for Chinese characters.

So what appears to be going on is that the server is storing the text incorrectly, and garbling it in the process ... then serving it up with an incorrect / inappropriate encoding type.

What a mess!

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
1

I am not commenting how you can get the UTF-8 characters properly in java code. Because you have almost tried different things and i believe one of them should work for you. But I want to help you with how you can correctly validate your different code changes.

Ok as per your comments

i am printing it by sysouts also displaying it in Log and also in Toast, my console can show the text which i have already tested !

The problem with any of these methods is that you need to make sure the output source is UTF-8 encoded. The problem with console is that it is not by default UTF-8 encoded. So while you try different ways in code , the console will not be able print the UTf-8 data correclty and hence you will never be able to validate the code properly.

If you are using eclipse android sdk, then there is a way to convert the encoding of your console. Here is how you do it:

Run Configuration -> Common -> Encoding (select UTF-8)

Juned Ahsan
  • 67,789
  • 12
  • 98
  • 136
  • thanks again i have already done `Run Configuration -> Common -> Encoding (select UTF-8)` but i am not sure about encoding at database end let me check that ! – Tarsem Singh Aug 17 '13 at 06:24