1

I want to convert a UTF-8 string to escape \uXXX format in value of JSON Object.

I used both JSON Object and Gson, but did not work for me in this case:

JSONObject js = new JSONObject();
js.put("lastReason","nguyễn");
System.out.println(js.toString());

and

Gson gson = new Gson();
String new_js = gson.toJson(js.toString());
System.out.println(new_js);

Output: {"test":"nguyễn"}

But i am expect that my result is:

Expected Output: {"test":"nguy\u1EC5n"}

Any solutions for this case, please help me to resolve it.

Nguyen
  • 33
  • 5
  • It depends from what lib do you use `JSONObject `, the problem is in `js.toString()` method. could you add a full package name for `JSONObject`? – Rustam Dec 17 '22 at 18:18
  • You can modify the string before adding it to the json map. see: https://stackoverflow.com/questions/6230190/convert-international-string-to-u-codes-in-java – SimGel Dec 17 '22 at 19:02
  • Surely then you'd want to do `js.put("lastReason","nguy\\u1EC5n");`? – g00se Dec 17 '22 at 23:26
  • @g00se sure, i need to sync up all string with escape unicode. – Nguyen Dec 18 '22 at 02:30
  • @SimGel if there is not any solution, i must loop all key in JSON and subkey contains JSON to escape them. – Nguyen Dec 18 '22 at 02:33
  • @Rustam I added JSONObject package (org.json.JSONObject), but it does'nt do escapse string. – Nguyen Dec 18 '22 at 02:35
  • If you want to encode the string during seralization another option would be to use `Jackson` with a custom serializer: https://www.baeldung.com/jackson-custom-serialization – SimGel Dec 18 '22 at 11:19
  • @SimGel thanks, i want to a way to dump json to string similar with json.dumps() in Python. In Python, when I used ```json.dumps(json_var)``` , json string is encoded automatically. For Java, are there any solution for that problem? – Nguyen Dec 19 '22 at 02:49
  • Why, btw, do you need this escaped form? – g00se Dec 19 '22 at 12:17

1 Answers1

2

You can use apache commons-text library to change a string to use Unicode escape sequences. Use org.apache.commons.text.StringEscapeUtils to translate the text before adding it to JSONObject.

StringEscapeUtils.escapeJava("nguyễn")

will produce

nguy\u1EC5n

One possible problem with using StringEscapeUtils might be that it will escape control characters as well. If there is a tab character at the end of the string it will be translated to \t. I.e.:

StringEscapeUtils.escapeJava("nguyễn\t")

will produce an incorrect string:

nguy\u1EC5n\t

You can use org.apache.commons.text.translate.UnicodeEscaper to get around this but it will translate every character in the string to a Unicode escape sequence. I.e.:

UnicodeEscaper ue = new UnicodeEscaper();
ue.translate(rawString);

will produce

\u006E\u0067\u0075\u0079\u1EC5\u006E

or 

\u006E\u0067\u0075\u0079\u1EC5\u006E\u0009

Whether it is a problem or not is up to you to decide.

Sergei
  • 536
  • 1
  • 4
  • 17
  • thank you so much, so only way translate text before putting ```JSONObject```. – Nguyen Dec 18 '22 at 09:03
  • 1
    Note that there is also the method [StringEscapeUtils.escapeJson()](https://commons.apache.org/proper/commons-text/javadocs/api-release/org/apache/commons/text/StringEscapeUtils.html#escapeJson(java.lang.String)) where _"The only difference between Java strings and Json strings is that in Json, forward-slash (/) is escaped"_. A minor point, but using `escapeJson()` instead of `escapeJava()` might make a bit more sense for the OP's use case, if only for clarity. – skomisa Dec 18 '22 at 09:39