-4

I'm have a problem to decode and encode String,

My program reads String Value Hungr\u00EDa from Response webservice, and then i need translate this value to Hungría.. I can't understand how to works. but when I send string into webservice request I need encode the value Hungría to Hungr\u00EDa.

String input = "Hungr\u00EDa";
logger.info("UTF8test.decodeUTF8: "+new String(input.getBytes(),Charset.forName("UTF-8"))); //output is Hungr?a, updated to UTF-8
jrey
  • 2,163
  • 1
  • 32
  • 48
  • 1
    That's not character-encoded in a different encoding. That's just an unicode codepoint in string format. – BalusC Apr 27 '13 at 19:44
  • `UTF8-8` is that correct? – Musa Apr 27 '13 at 19:45
  • 4
    shouldn't `UTF8-8` be `UTF-8`? – Pshemo Apr 27 '13 at 19:47
  • When I run that code, I get the desired output, but I'm not using `logger`. Maybe `logger` doesn't know how to handle the unicode character? – Fls'Zen Apr 27 '13 at 19:47
  • 1
    "but when i'm send string into webservice request I need encode the value Hungría to Hungr\u00EDa" - is it a JSON request then? If so, use a JSON library. It's quite unclear exactly what you mean at the moment - but you should *not* be re-encoding the string in the way you currently are. – Jon Skeet Apr 27 '13 at 19:50
  • 1
    Your question is similar to this: http://stackoverflow.com/questions/11145681/how-to-convert-a-string-with-unicode-encoding-to-a-string-of-letters – Chriss Apr 27 '13 at 19:51
  • the server side, reads Hungr\u00EDa has String – jrey Apr 27 '13 at 19:54
  • `"Hungr\u00EDa".length()` is 7, and not 12. Are you misunderstanding something? – jlordo Apr 27 '13 at 19:55
  • Chriss, Thanks the similar post is exactly my problem, there is my solution. thanks a lot. sorry to all for my bad interpretation about the problem. – jrey Apr 29 '13 at 12:42

1 Answers1

5

I have the impression you are not yet clear on what UTF-8 is, and what it isn't.

Most likely, the output actually is in UTF-8 (at least if you fix your typo. Consider using shorter lines, too!). But Hungr\u00EDa is not UTF-8. Hungría is, assuming that you access stackoverflow in UTF-8. What \u00ED is is not UTF-8. It is an different encoding based on UTF-8, I would call this "backslash-escaped-unicode". See: 00ED is probably the hexadecimal unicode character code of the character you want. The UTF-8 encoding of this character is the two bytes 0xC3 0xAD, while in HTML it would be encoded as í.

Has QUIT--Anony-Mousse
  • 76,138
  • 12
  • 138
  • 194