Converting windows-1252 Java

Question

I have a java string with this value:

=C3=A1 =C3=A0 =C3=A7 =C3=A3 =C3=B5 =C3=A9 =C3=9A =C3=81 =C3=A2 =C3=A9 UHA a=C3==A7=C3=A3

I think that is coded with windows-1252. I want to convert that to a readable string. I tried to convert using UTF-8 but it doesn't work properly. Someone can help me?

Possible duplicate of [Java convert Windows-1252 to UTF-8, some letters are wrong](http://stackoverflow.com/questions/23082522/java-convert-windows-1252-to-utf-8-some-letters-are-wrong) — dezhik, Feb 05 '16 at 11:37
Where are you getting that string from? It doesn't look like a UTF problem at all. — Aaron Gillion, Feb 05 '16 at 11:43
I tried the http://stackoverflow.com/questions/23082522/java-convert-windows-1252-to-utf-8-some-letters-are-wrong but doesn't work for me. — brunoroberto, Feb 05 '16 at 11:49
I'm getting from a c++ server. I made a request, and the server returns that string. — brunoroberto, Feb 05 '16 at 11:50
@brunoroberto What does the documentation of that c++ server say? It should document what it returns so other can work with it, else its just an expansive paperweight. — Ferrybig, Feb 05 '16 at 12:48
@Ferrybig I'm trying to get an email message from C++ server. When the word of the message doesn't have an accent, it works properly, but when the word has accent like "Á", I receive that string above. The server has no documentation, is a legacy system. — brunoroberto, Feb 05 '16 at 13:05
Do you have a example to what the above line decodes to? We can do much more if we know the decoded result of the above, at the moment it is to broad since to many results exists. Even adding more examples and their results may help. — Ferrybig, Feb 05 '16 at 13:28
@Ferrybig This string decoded is: "á à ç ã õ é Ú Á â é UHA ação". It is just a test. But, I don't know how to handle that code. — brunoroberto, Feb 05 '16 at 13:42
There should be a MIME header on the email and I suspect it would mention quoted-printable. — David Conrad, Feb 05 '16 at 15:42

SubOptimal · Accepted Answer · 2016-02-05T15:34:52.237

The string contains charcaters which are encoded as Quoted-Printable.

The part =C3=A1 is the á encoded as UTF-8.

Small snippet to show the decoding.

String hexChars = "0123456789ABCDEF";
String s = "=C3=A1 =C3=A0 =C3=A7 =C3=A3 =C3=B5 =C3=A9 =C3=9A"
        + " =C3=81 =C3=A2 =C3=A9 UHA a=C3=A7=C3=A3";
int stringIndex = 0;
int bytesIndex = 0;
byte[] bytes = new byte[s.length()];
while (stringIndex < s.length()) {
    if (s.charAt(stringIndex) == '=' 
            && hexChars.indexOf(s.charAt(stringIndex+1)) >= 0
            && hexChars.indexOf(s.charAt(stringIndex+2)) >= 0
            ) {
        int hex = hexChars.indexOf(s.charAt(stringIndex+1));
        hex <<= 4;
        hex += hexChars.indexOf(s.charAt(stringIndex+2));
        bytes[bytesIndex] = (byte) hex;
        stringIndex += 2;
    } else {
        bytes[bytesIndex] = (byte) (s.charAt(stringIndex) & 0XFF);
    }
    stringIndex++;
    bytesIndex++;
}
System.out.println("bytes = " + new String(bytes, 0, bytesIndex, 
        StandardCharsets.UTF_8));

output

bytes = á à ç ã õ é Ú Á â é UHA açã

The snippet is only for demonstration purpose. Have a look for a library which does the decoding of quoted-printable for you.

Your byte array has trailing zero bytes, since the length of the byte data is shorter than the string's length. You should use `new String(bytes, 0, bytesIndex, StandardCharsets.UTF_8)`. — VGR, Feb 05 '16 at 14:33
It worked, thanks! The only thing is that the letter 'ç' doesn't work, but thanks! — brunoroberto, Feb 05 '16 at 15:27
@VGR You're absolutely right. Even the code is meant for demonstration purpose only it should not have such an error. Thanks for the comment. Code has been changed accordingly. — SubOptimal, Feb 05 '16 at 15:38

Converting windows-1252 Java

1 Answers1