3

I tried a lot of solutions from Stack Overflow but cannot solve the problem. I have this JSON object with a value, title, in UTF-8 and I need to convert it to a Java String:

{"id":"118","title":"\u00c7\u00c0\u00c7"}

I ended up with this approach but it doesn't work:

String title = new String(JsonObj.getString("title").getBytes(), "US-ASCII"); 

String title = new String(JsonObj.getString("title").getBytes());

English titles are shown correctly as Wartburg, Wiesmann, Xin Kai. Russian are shown like ÂÀÇ, Âåëòà, ÃÀÇ

What is wrong and how can I convert it into normal characters?

EDIT:

Here is how I am receiving JSON

 JSONObject jsonObject = new JSONObject();

            try {

                //                sending empty JSON in this request
                String jsonRequest = jsonObject.toString();
                Log.v(LOG_TAG, "JSON: " + jsonRequest);

                URL url = new URL(STRING_URL);

                urlConnection = (HttpURLConnection) url.openConnection();
                urlConnection.setRequestMethod("POST");

                //  hashing the signature
                String md5Signature = MD5Utils.md5Apache(KEY + jsonRequest);

                //                setting heading property
                urlConnection.setRequestProperty(AA_SIGNATURE, md5Signature);

                urlConnection.setDoOutput(true);
                DataOutputStream wr = new DataOutputStream(urlConnection.getOutputStream());
                wr.writeBytes(jsonRequest);
                wr.flush();
                wr.close();

                //            read the inputshtream into the String
                InputStream inputStream = urlConnection.getInputStream();

                if (inputStream == null) {
                    //                nothing to do
                    return null;
                }

                reader = new BufferedReader(
                        new InputStreamReader(inputStream));

                String inputLine;
                StringBuffer buffer = new StringBuffer();

                while ((inputLine = reader.readLine()) != null) {
                    buffer.append(inputLine);
                }

                if (buffer.length() == 0) {
                    // Stream was empty
                    return null;
                }

                // String buffer
                String responseJsonStr = buffer.toString();
                Log.v(LOG_TAG, "Final String buffer: " + responseJsonStr);


                //                trying to parse json string and return result
                try {
                    return getCarBrandsOrModelsFromJson(responseJsonStr);
                } catch (JSONException e) {
                    e.printStackTrace();
                }


            } catch (MalformedURLException e) {
                e.printStackTrace();
            } catch (IOException e) {
                e.printStackTrace();
            } finally {

                if (urlConnection != null) {
                    urlConnection.disconnect();
                }

                if (reader != null) {
                    try {
                        reader.close();
                    } catch (IOException e) {
                        Log.e(LOG_TAG, "Error closing stream");
                    }
                }
            }
            return null;
        }

Here is how I am parsing

 private HashMap<String, Integer> getCarBrandsOrModelsFromJson(String carBrandsOrModelsJsonStr) throws JSONException {

        //        these are the names of JSON objects needed to be extracted
        final String AA_DATA = "data";
        final String AA_TITLE = "title";
        final String AA_ID = "id";

        JSONObject carBrandsJson = new JSONObject(carBrandsOrModelsJsonStr);
        JSONArray brandsArray = carBrandsJson.getJSONArray(AA_DATA);

        HashMap<String, Integer> carBrandsMap = new HashMap<String, Integer>();

        for (int i = 0; i < brandsArray.length(); i++) {

            String brand = null;
            Integer id;

            //            Get the JSON object representing the one brand
            JSONObject oneBrandJson = brandsArray.getJSONObject(i);

            //            getting brand and id

            // ===================>>> ?
            brand = new String(oneBrandJson.getString(AA_TITLE).getBytes(), "UTF8");
            //            brand = oneBrandJson.getString(AA_TITLE);
            brand = oneBrandJson.getString(AA_TITLE);

            id = oneBrandJson.getInt(AA_ID);

            //            adding brand and id into hashmap
            carBrandsMap.put(brand, id);
        }

        //        Logging result
        for (Map.Entry<String, Integer> entry : carBrandsMap.entrySet()) {
            Log.v(LOG_TAG, ("\n" + entry.getKey() + " / " + entry.getValue()));
        }

        return carBrandsMap;
    }
SilverlightFox
  • 32,436
  • 11
  • 76
  • 145
Androider
  • 407
  • 2
  • 7
  • 17
  • 4
    Your JSON parser should take care of that for you. `\unnnn` is part of the JSON specification (and it is *not* UTF-8, BTW). – Biffen Jan 29 '16 at 16:15
  • http://stackoverflow.com/questions/88838/how-to-convert-strings-to-and-from-utf8-byte-arrays-in-java – ruyili Jan 29 '16 at 16:15
  • I have tried this solution but still can not succeed – Androider Jan 29 '16 at 16:18
  • Biffen, can you write in more detali please – Androider Jan 29 '16 at 16:22
  • 1
    this has nothing to do with utf-8. these are unicode codepoints. What parsing library are you using (and anyway if it does not take care of that sort of things, change it) – njzk2 Jan 29 '16 at 16:24
  • 1
    @Androider `\unnnn` is part of JSON's syntax. A JSON parser will convert it into whatever encoding it uses. In other words, `JsonObj.getString("title")` should give you the right string. If it doesn't, then your JSON parser is broken. – Biffen Jan 29 '16 at 16:25
  • 1
    What are you using to read JSON from Java? Which result are you getting now? If you just use `JsonObj.getString("title")`, then how long is the resulting string and which characters does it consist of? – Goblin Alchemist Jan 29 '16 at 16:31
  • If I use JsonObj.getString("title") I am receiving correct values for entlsh letters, but russian are shown in this way {"id":"113","title":"Xin Kai"},{"id":"114","title":"ZX"},{"id":"115","title":"\u00c2\u00c0\u00c7"} – Androider Jan 29 '16 at 16:43
  • @Androider That's data encoded as JSON. What does *the parsed* string look like? – Biffen Jan 29 '16 at 16:47
  • English titles are shown correctly as Wartburg, Wiesmann, Xin Kai. Russian are shown like ÂÀÇ, Âåëòà, ÃÀÇ – Androider Jan 29 '16 at 16:55
  • 1
    @Androider The *correct* parsing of the JSON string `"\u00c7\u00c0\u00c7"` gives the string `ÇÀÇ` (`\u00c7` means U+00C7, LATIN CAPITAL LETTER C WITH CEDILLA, and so on). If that is not what you expect, then it's the JSON *encoding* that's faulty. – Biffen Jan 29 '16 at 17:31
  • Thank you! This coluld be the reason. Will try to figure out again on the other side – Androider Jan 29 '16 at 17:37
  • @Androider Tip: There are plenty of websites (e.g. [this one](http://www.jsoneditoronline.org)) that let you view your JSON decoded, before you move on to parsing it on your code. – Biffen Jan 29 '16 at 17:40
  • 1
    If you need Russian words then they are probably "ВАЗ", "Велта", "ГАЗ". The correct encoding is "\u0412\u0410\u0417", "\u0412\u0435\u043B\u0442\u0430", "\u0413\u0410\u0417". As @Biffen correctly suggested, the strings are corrupted on the other side. See what happens: a Russian character Г is stored using the Russian "Windows-1251" 8-bit encoding where it has the value 195, or \xC3; then, this value is incorrectly used as a Unicode value, \u00C3, while \u00C3 is actually the Extended Latin character Ã. The correct way is to use the actual Unicode value of Г which is \u0413. – Goblin Alchemist Jan 29 '16 at 19:13
  • Yes, exactly. Thank you. So it doesn'n make sense to convert it myself? – Androider Jan 30 '16 at 11:51
  • 1
    Strictly speaking, your input JSON data is broken. If the JSON is generated by request and you have control over the module which is generating it, then it's the best to fix that module to have correct Unicode in JSON. Otherwise, if you don't have control over that part of the system, or if you have a database with millions of corrupt JSON files and you can't regenerate them all, then you will have to try to "decode" corrupt data on your side... You will have to know that the text is actually Russian (now this information is lost), and it will not work for French or Chinese. – Goblin Alchemist Feb 01 '16 at 10:39

1 Answers1

2

The code below converts from Unicode to UTF-8.

String original = JsonObj.getString("title");
try {
   byte[] utf8Bytes = original.getBytes("UTF-8");
   String roundTrip = new String(utf8Bytes, "UTF-8");
} 
catch (UnsupportedEncodingException e) {
    e.printStackTrace();
}

EDIT:

Seems that your Unicode string was encoded to cp1252 before. To decode it back you should use

String roundTrip = new String(utf8Bytes);
byte[] bytes= roundTrip.getBytes("cp1252");
String roundTrip2 = new String(bytes, "cp1251");
Roman C
  • 49,761
  • 33
  • 66
  • 176
  • Thank your. Unfortunately I am still having these characters with your code. Is it ok that you are using UTF-8 two times? – Androider Jan 29 '16 at 17:27
  • @Androider What are you really expecting to see? – catch23 Jan 29 '16 at 17:28
  • I am expecting to see russian words instead of ÂÀÇ, Âåëòà, ÃÀÇ – Androider Jan 29 '16 at 17:29
  • @Androider How exactly should look `\u00c7\u00c0\u00c7` after processing? For now I have it like `ÇÀÇ`. – catch23 Jan 29 '16 at 17:45
  • @Androider yeah, it's ok, it's printing ÇÀÇ – Roman C Jan 29 '16 at 17:54
  • I was adviced on another form a decoder: https://www.artlebedev.ru/tools/decoder/ And t shows text ÂÀÇ, Âåëòà, ÃÀÇ as it has to be: ВАЗ, Велта, ГАЗ. It tels that decoded the text in this way: CP1252 → CP1251. So the text is probably dcecoded in CP1252? I tried to change "UTF-8" to "windows-1252" in your code, but I still can not see proper chars – Androider Jan 30 '16 at 11:30
  • Thank you! Your code works perfectly. P.S. дякую, земляче:) – Androider Feb 01 '16 at 12:02