-1

I have a java string with this value:

=C3=A1 =C3=A0 =C3=A7 =C3=A3 =C3=B5 =C3=A9 =C3=9A =C3=81 =C3=A2 =C3=A9 UHA a=C3==A7=C3=A3

I think that is coded with windows-1252. I want to convert that to a readable string. I tried to convert using UTF-8 but it doesn't work properly. Someone can help me?

  • Possible duplicate of [Java convert Windows-1252 to UTF-8, some letters are wrong](http://stackoverflow.com/questions/23082522/java-convert-windows-1252-to-utf-8-some-letters-are-wrong) – dezhik Feb 05 '16 at 11:37
  • Where are you getting that string from? It doesn't look like a UTF problem at all. – Aaron Gillion Feb 05 '16 at 11:43
  • I tried the http://stackoverflow.com/questions/23082522/java-convert-windows-1252-to-utf-8-some-letters-are-wrong but doesn't work for me. – brunoroberto Feb 05 '16 at 11:49
  • I'm getting from a c++ server. I made a request, and the server returns that string. – brunoroberto Feb 05 '16 at 11:50
  • @brunoroberto What does the documentation of that c++ server say? It should document what it returns so other can work with it, else its just an expansive paperweight. – Ferrybig Feb 05 '16 at 12:48
  • @Ferrybig I'm trying to get an email message from C++ server. When the word of the message doesn't have an accent, it works properly, but when the word has accent like "Á", I receive that string above. The server has no documentation, is a legacy system. – brunoroberto Feb 05 '16 at 13:05
  • Do you have a example to what the above line decodes to? We can do much more if we know the decoded result of the above, at the moment it is to broad since to many results exists. Even adding more examples and their results may help. – Ferrybig Feb 05 '16 at 13:28
  • @Ferrybig This string decoded is: "á à ç ã õ é Ú Á â é UHA ação". It is just a test. But, I don't know how to handle that code. – brunoroberto Feb 05 '16 at 13:42
  • There should be a MIME header on the email and I suspect it would mention quoted-printable. – David Conrad Feb 05 '16 at 15:42

1 Answers1

2

The string contains charcaters which are encoded as Quoted-Printable.

The part =C3=A1 is the á encoded as UTF-8.

Small snippet to show the decoding.

String hexChars = "0123456789ABCDEF";
String s = "=C3=A1 =C3=A0 =C3=A7 =C3=A3 =C3=B5 =C3=A9 =C3=9A"
        + " =C3=81 =C3=A2 =C3=A9 UHA a=C3=A7=C3=A3";
int stringIndex = 0;
int bytesIndex = 0;
byte[] bytes = new byte[s.length()];
while (stringIndex < s.length()) {
    if (s.charAt(stringIndex) == '=' 
            && hexChars.indexOf(s.charAt(stringIndex+1)) >= 0
            && hexChars.indexOf(s.charAt(stringIndex+2)) >= 0
            ) {
        int hex = hexChars.indexOf(s.charAt(stringIndex+1));
        hex <<= 4;
        hex += hexChars.indexOf(s.charAt(stringIndex+2));
        bytes[bytesIndex] = (byte) hex;
        stringIndex += 2;
    } else {
        bytes[bytesIndex] = (byte) (s.charAt(stringIndex) & 0XFF);
    }
    stringIndex++;
    bytesIndex++;
}
System.out.println("bytes = " + new String(bytes, 0, bytesIndex, 
        StandardCharsets.UTF_8));

output

bytes = á à ç ã õ é Ú Á â é UHA açã

The snippet is only for demonstration purpose. Have a look for a library which does the decoding of quoted-printable for you.

SubOptimal
  • 22,518
  • 3
  • 53
  • 69
  • Your byte array has trailing zero bytes, since the length of the byte data is shorter than the string's length. You should use `new String(bytes, 0, bytesIndex, StandardCharsets.UTF_8)`. – VGR Feb 05 '16 at 14:33
  • It worked, thanks! The only thing is that the letter 'ç' doesn't work, but thanks! – brunoroberto Feb 05 '16 at 15:27
  • 1
    @VGR You're absolutely right. Even the code is meant for demonstration purpose only it should not have such an error. Thanks for the comment. Code has been changed accordingly. – SubOptimal Feb 05 '16 at 15:38