2

I have an application that get som Strings by JSON.

The problem is that I think that they are sending it as ASCII and the text really should be in unicode.

For example, there are parts of the string that is "\u00f6" which is the swedish letter "ö"

For example the swedish word for "buy" is "köpa" and the string I get is "k\u00f6pa"

Is there an easy way for me after I recived this String in java to convert it to the correct representation?

That is, I want to convert strings like "k\u00f6pa" to "köpa"

Thank for all help!

Folke
  • 59
  • 4
  • Please refer to [ask] to help you formulate the question slightly and obtain better answers – blurfus Mar 06 '15 at 17:45
  • This is answered in http://stackoverflow.com/a/14368185/1100158 – ccarton Mar 06 '15 at 17:52
  • 2
    Are you writing your own JSON parser or is the library you are using faulty? Why do you think the text should contain Unicode characters? Unicode escape sequences are valid in JSON strings. – xehpuk Mar 06 '15 at 18:02
  • Before you do anything be *very* sure that, after the JSON is received and placed into a Java String object, the *individual* characters "\u00f6" are present in the String. Do not trust debuggers or diagnostic dumps to show you the actual Unicode glyphs, since often they translate stuff into escape sequences to display it on non-multilingual displays. Lots of energy is wasted in this area, fixing stuff that ain't broke. – Hot Licks Mar 06 '15 at 19:12

2 Answers2

1

Well, that is easy enough, just use a JSON library. With Jackson for instance you will:

final ObjectMapper mapper = new ObjectMapper();

final JsonNode node = mapper.readTree(your, source, here);

The JsonNode will in fact be a TextNode; you can just retrieve the text as:

node.textValue()

Note that this IS NOT an "ASCII representation" of a String; it just happens that JSON strings can contain UTF-16 code unit character escapes like this one.

(you will lose the quotes around the value, though, but that is probably what you expect anyway)

fge
  • 119,121
  • 33
  • 254
  • 329
0

The hex code is just 2 bytes of integer, which an int can handle just fine -- so you can just use Integer.parse(s, 16) where s is the string without the "\u" prefix. Then you just narrow that int to a char, which is guaranteed to fit.

Throw in some regex (to validate the string and also extract the hex code), and you're all done.

Pattern p = Pattern.compile("\\\\u([0-9a-fA-F]{4})");
Matcher m = p.matcher(arg);
if (m.matches()) {
  String code = m.group(1);
  int i = Integer.parseInt(code, 16);
  char c = (char) i;
  System.out.println(c);
}
yshavit
  • 42,327
  • 7
  • 87
  • 124