0

I have a String which is :

PRESIÓN MÁXIMA:

This string is in ISO-8859, I want to write it in an xml UTF-8 file :

I get its UTF-8 bytes values which seems good :

utf8 bytes

-61, -109 = C3 93 = Ó

-61, -127 = C3 81 = Á

But when I turn back this array of bytes into a String, the Ó is OK, but not the Á :

ut8 bytes as String

For some unknown reason the C3 81 become a C3 3F

There is something that I dont understand with encoding, at least I would expect both character to be wrong.

How can I fix / convert my String ?

jpprade
  • 3,497
  • 3
  • 45
  • 58
  • 1
    You're making a new String with `UTF-8` bytes, but with platform default encoding. In addition, what do you think you're doing? I hope you don't think you are [Converting String from One Charset to Another](https://stackoverflow.com/q/29667977/2541560). If you have a String, don't touch it. Just write it with the correct encoding and that's it. – Kayaman Jan 30 '22 at 14:16
  • yeah I though I was converting charset ... , by reading your link it seems not easy :( – jpprade Jan 31 '22 at 09:59
  • No, it's unnecessary. The reason it seems it's not easy is because it doesn't make sense. That's why whenever someone is trying to "convert a charset", they think it's hard because they don't understand they're trying to do a very stupid thing which is unnecessary. But granted, if you don't understand how encoding works, then it might not be easy to realize that you don't have an "UTF-8 string" or a "ISO-8859 string". – Kayaman Jan 31 '22 at 10:21
  • 1
    @jpprade "*I have a String which is ... in ISO-8859*" - Java strings are exclusively UTF-16 (they *may* be stored internally as ISO, but the public interface is UTF-16). You are converting UTF-16 chars into UTF-8 bytes, then converting those UTF-8 bytes back into UTF-16 chars using a platform charset that is clearly not UTF-8, which is why the chars don't match the originals. And then you are converting the now-corrupted UTF-16 string back into platform bytes again. You should NEVER rely on the platform charset, ALWAYS call `getBytes()` or `new String()` explicitly stating the charset you want – Remy Lebeau Feb 02 '22 at 19:15
  • What is the default encoding (the result of `System.getProperty("file.encoding")`) of your environment? I have tried `getBytes()` for all the `Charset` supported in Java (https://docs.oracle.com/javase/8/docs/technotes/guides/intl/encoding.doc.html), but no one equals your result. – SATO Yusuke Aug 14 '22 at 13:15

0 Answers0