0

My UTF-8 strings have been converted to ISO-8859-1 strings in the following way:

  • Characters 0 to 127 (hex 0x7F) have been left intact (0-9,a-z,A-Z, etc).
  • Characters 128 and above have been converted to two ISO-8859-1 characters: é becomes é, Ͷ becomes Ͷ, etc.

Is there a way to undo this conversion, so that é becomes é for example?

Guillaume F.
  • 1,010
  • 7
  • 21
  • Does this answer your question? [Converting UTF-8 to ISO-8859-1 in Java - how to keep it as single byte](https://stackoverflow.com/questions/655891/converting-utf-8-to-iso-8859-1-in-java-how-to-keep-it-as-single-byte) – Manu Sharma May 27 '20 at 14:36
  • I don't think it does. I wish to convert two ISO-8859-1 characters to one UTF-8 character, while the question you posted is about converting a two bytes UTF-8 character to a ISO-8859-1 character. – Guillaume F. May 27 '20 at 14:51
  • 1
    This just means you are reading characters incorrectly. Are you using readers or `new String` without specifying an explicit character set? If so, please start using explicit character set, otherwise the platform default is used (and on Windows this might be UTF-8 in some contexts, and - for example - Cp1252 in other contexts). Please provide a [mre]. – Mark Rotteveel May 27 '20 at 14:54
  • @MarkRotteveel I am using strings already supplied to me. For example I am trying to change `String accentE = "é";` to it's intended value `é`. – Guillaume F. May 27 '20 at 15:00
  • "already supplied to me", then whatever is supplying you with those values is using the wrong character set when reading data. In this case you might simply solve it by converting to bytes using Cp1252 or iso-8859-1 and then back to string using UTF-8. but such solutions will not always work (not all UTF-8 bytes are valid characters in iso-8859-1, so might be mapped to `?` instead), and therefor it is better solved at the source where the wrong encoding is used when reading data. – Mark Rotteveel May 27 '20 at 15:07

1 Answers1

2

Suppose we have a string containing double iso-8859-1 characters, such as é.

To convert double iso-8859-1 to UTF-8 characters, we can use this constructor of String. Pass an array of byte and a CharSet object. The class java.nio.charset.StandardCharsets provides constants for various CharSet objects.

String accentE = 
        new String(
            "é".getBytes(StandardCharsets.ISO_8859_1), 
            StandardCharsets.UTF_8
        )
;

which is é

See this code run live at IdeOne.com.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
Guillaume F.
  • 1,010
  • 7
  • 21