0

I need some explanation about Java String and it's encoding. Due to some task, I got a object which I parsed to String by method .toString(). In fact it was encoded in Cp1250 so my string has some wrong characters. So I decided to decode it. As far as I read many pages on the Internet, String is usually encoded with UTF-8. So I decided to get bytes from the strings encoded as UTF-8, and encode it to a String as Cp1250.

The chunk of code:

    byte[] b = response.getBytes(StandardCharsets.UTF-8);
    String res = new String(b, "windows-1250");

It worked partially, because it shows the proper sign, but with an additional strange one.

I tried it also with StandardCharsets.UTF-16 but it was not working. Anyway it worked with StandardCharsets.ISO_8859_1. I don't understand why, if the String is encoded with UTF why i got a String from method .toString() encoded with ISO_8859_1?

Any explanation?

Vaix
  • 49
  • 1
  • 9
  • 1
    "I got a object which I parsed to String by method .toString(). In fact it was encoded in Cp1250 so my string has some wrong characters" That's just a bug in the `toString()` implementation of that object. Why not just fix that? – Andy Turner Jun 30 '17 at 10:36
  • Darn, we need a definitive String/Encoding question/answer, similar to "What's a NPE and how do I fix it". So many confused questions about "converting String's encoding". – Kayaman Jun 30 '17 at 10:41
  • the problem is solved with a method, of course it could be done with overriding `.toString()` method (which in this case is better solution), but still don't understand why I got `ISO_8859_1' instead `UTF` before – Vaix Jun 30 '17 at 10:51
  • 1
    If `toString()` returns a "broken" `String`, you can't fix it afterwards. See https://stackoverflow.com/questions/5729806/encode-string-to-utf-8 to understand what is inherently wrong with the code displayed in your question. – Kayaman Jun 30 '17 at 10:55
  • 2
    The code you posted is wrong. What you are doing is converting a string to bytes using the UTF-8 encoding and then you convert it back to a string as if it is in windows-1250 encoding - which it is not, because you just encoded it with UTF-8! As if you have a book written in English and you tell someone "Read this, it's written in German"... – Jesper Jun 30 '17 at 11:07
  • @Jesper excellent analogy! – Kayaman Jun 30 '17 at 11:12
  • @Jasper I don't agree. You just didn't understand, maybe not enought explanation from my side. Imagine that someone ask me to pass a message. At start message is in Spanish. The author's know that the receiver understand only German, so he translate message to German, but he is mooron, and by mistake translate it to English. He passed me a message and I see that's in English! Well, I roll it back to Spanish then write it in German. (Assuming there is no direct possibility to translate from English to German). It works! But the question is Why not overriden .toString returns ISO standard for me – Vaix Jun 30 '17 at 11:43

0 Answers0