-1

My question may already have been answered on StackoverFlow, but I can't find it. My problem is simple: I request data via an API, and the data returned have unicode characters, for example:

"SpecialOffer":[{"title":"Offre Vente Priv\u00e9e 1 jour 2019 2020"}]

I need to convert the "\u00e9e" to "é". I cant't make a "replaceAll", because I cannot know all the characters that there will be in advance.

I try this :

byte[] utf8 = reponse.getBytes("UTF-8")
String string = new String(utf8, "UTF-8");

But the string still has "\u00e9e"

Also this :

byte[] utf8 = reponse.getBytes(StandardCharsets.UTF_8);
String string = new String(utf8, StandardCharsets.UTF_8);

Also tried this :

    string = string.replace("\\\\", "\\");
    byte[] utf8Bytes = null;
    String convertedString = null;
    utf8Bytes = string.getBytes("UTF8") -- Or StandardCharsets.UTF_8 OR UTF-8 OR UTF_8;
    convertedString = new String(utf8Bytes, "UTF8") -- Or StandardCharsets.UTF_8 OR UTF-8 OR UTF_8;;
    System.out.println(convertedString); 
    return convertedString;

But it doesn't work either.

I tested other methods but I think I deleted everything like that didn't work so I can't show them to you here.

I am sure there is a very simple method, but I should not search with the right vocabulary on the internet. Can you help me please ?

I wish you a very good day, and thank you very much in advance.

Mangue Sutcliff
  • 1,429
  • 1
  • 12
  • 16
  • 1
    `string.getBytes("UTF-8");` instead of `string.getBytes("UTF8");` should do the trick. – Joel Feb 24 '20 at 10:40
  • @Joel Thank you for your comment, I just tested just now, but doesn't work either... :/ – Mangue Sutcliff Feb 24 '20 at 10:42
  • Does this answer your question? [How to convert a string with Unicode encoding to a string of letters](https://stackoverflow.com/questions/11145681/how-to-convert-a-string-with-unicode-encoding-to-a-string-of-letters) – jhamon Feb 24 '20 at 10:45
  • Okay, try to convert it to Unicode first - then convert it utf8 – Joel Feb 24 '20 at 10:56
  • @jhamon Thank you for your comment. I had already tested this method and I find it difficult to adapt it for my case. In my case, my String can also contain numbers, "u", and there can be lots of unicodes distributed randomly. – Mangue Sutcliff Feb 24 '20 at 10:56
  • There are multiple answers in the linked question – jhamon Feb 24 '20 at 10:57

3 Answers3

0

The String.getBytes method requires a valid Charset [1]

From the javadoc [2] the valid cases are

  • US-ASCII
  • ISO-8859-1
  • UTF-8
  • UTF-16BE
  • UTF-16LE
  • UTF-16

So you need to use UTF-8 in the getBytes method.

[1] https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#getBytes-java.nio.charset.Charset- [2] https://docs.oracle.com/javase/8/docs/api/java/nio/charset/Charset.html

Oscerd
  • 1,616
  • 11
  • 14
0

You can use small json library

String jsonstring = "{\"SpecialOffer\":[{\"title\":\"Offre Vente Priv\\u00e9e 1 jour 2019 2020\"}]}";
JsonValue json = JsonParser.parse(jsonstring);
String value = json.asObject()
    .first("SpecialOffer").asArray().get(0)
    .asObject().first("title").asStringLiteral().stringValue();
System.out.println(" result: " + value);

or

String text = "Offre Vente Priv\\u00e9e 1 jour 2019 2020";
System.out.println(" result: " + JsonEscaper.unescape(text));
Anton Straka
  • 139
  • 2
0

The problem that I had not seen, is that the API did not return me "\u00e9e" but "\\u00e9e" as it was a character sequence and not a unicode character! So I have to recreate all the unicodes, and everything works fine!

int i=0, len=s.length();
        char c;
        StringBuffer sb = new StringBuffer(len);
        while (i < len) {
            c = s.charAt(i++);
            if (c == '\\') {
                if (i < len) {
                    c = s.charAt(i++);
                    if (c == 'u') {
                        // TODO: check that 4 more chars exist and are all hex digits
                        c = (char) Integer.parseInt(s.substring(i, i+4), 16);
                        i += 4;
                    } // add other cases here as desired...
                }
            } // fall through: \ escapes itself, quotes any character but u
            sb.append(c);
        }
        return sb.toString();

Find this solution here: Java: How to create unicode from string "\u00C3" etc

TylerH
  • 20,799
  • 66
  • 75
  • 101
Mangue Sutcliff
  • 1,429
  • 1
  • 12
  • 16