Java String HEX to String ASCII with accentuation

Question

I have the String String hex = "6174656ec3a7c3a36f"; and i wanna get the String output = "atenção" but in my test i only get String output = "aten????o"; what i m doing wrong?

String hex = "6174656ec3a7c3a36f";
StringBuilder output = new StringBuilder();
for (int i = 0; i < hex.length(); i+=2) {
  String str = hex.substring(i, i+2);
  output.append((char)Integer.parseInt(str, 16));
} 

System.out.println(output); //here is the output "aten????o"

Possible dup http://stackoverflow.com/questions/655891/converting-utf-8-to-iso-8859-1-in-java-how-to-keep-it-as-single-byte — Shmil The Cat, Apr 01 '13 at 18:48

jedwards · Accepted Answer · 2013-04-01T18:57:49.050

Consider

String hex = "6174656ec3a7c3a36f";                                  // AAA
ByteBuffer buff = ByteBuffer.allocate(hex.length()/2);
for (int i = 0; i < hex.length(); i+=2) {
    buff.put((byte)Integer.parseInt(hex.substring(i, i+2), 16));
}
buff.rewind();
Charset cs = Charset.forName("UTF-8");                              // BBB
CharBuffer cb = cs.decode(buff);                                    // BBB
System.out.println(cb.toString());                                  // CCC

Which prints: atenção

Basically, your hex string represents the hexidecimal encoding of the bytes that represent the characters in the string atenção when encoded in UTF-8.

To decode:

You first have to go from your hex string to bytes (AAA)
Then go from bytes to chars (BBB) -- this is dependent on the encoding, in your case UTF-8.
The go from chars to a string (CCC)

This could be improved with `var cs = StandardCharset.UTF_8;` right? — Cedric, Sep 02 '21 at 13:32

score 5 · Answer 2 · answered Apr 01 '13 at 18:53

Your hex string appears to denote a UTF-8 string, rather than ISO-8859-1.

The reason I can say this is that if it was ISO-8859-1, you'd have two hex digits per character. Your hex string has 18 characters, but your expected output is only 7 characters. Hence, the hex string must be a variable width encoding, and not a single byte per character like ISO-8859-1.

The following program produces the output: atenção

    String hex = "6174656ec3a7c3a36f";
    ByteArrayOutputStream baos = new ByteArrayOutputStream();
    for (int i = 0; i < hex.length(); i += 2) {
      String str = hex.substring(i, i + 2);
      int byteVal = Integer.parseInt(str, 16);
      baos.write(byteVal);
    } 
    String s = new String(baos.toByteArray(), Charset.forName("UTF-8"));

If you change UTF-8 to ISO-8859-1, you'll see: atenÃ§Ã£o.

Aubin · Answer 3 · 2013-04-01T19:08:13.760

The Java Strings are Unicode: each character is encoded on 16 bits. Your String is - I suppose - a "C" string. You have to know the name of the character encoder and use CharsetDecoder.

import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.CharacterCodingException;
import java.nio.charset.Charset;
import java.nio.charset.CharsetDecoder;

public class Char8859_1Decoder {

   public static void main( String[] args ) throws CharacterCodingException {
      String hex = "6174656ec3a7c3a36f";
      int len = hex.length();
      byte[] cStr = new byte[len/2];
      for( int i = 0; i < len; i+=2 ) {
         cStr[i/2] = (byte)Integer.parseInt( hex.substring( i, i+2 ), 16 );
      }
      CharsetDecoder decoder = Charset.forName( "UTF-8" ).newDecoder();
      CharBuffer cb = decoder.decode( ByteBuffer.wrap( cStr ));
      System.out.println( cb.toString());
   }
}

PaulProgrammer · Answer 4 · 2013-10-08T04:52:02.843

The ç and ã are 16-bit characters, so they are not represented by a byte as you assume in your decode routine, but rather by a full word.

I would, instead of converting each byte to a char, convert the bytes to java Bytes, and then use a string routine to decode the array of Bytes to a string, allowing java the dull task of determining the decoding routine.

Of course, java may guess wrong, so you might have to know ahead of time what the encoding is, as per the answer given by @Aubin or @Martin Ellis

Java String HEX to String ASCII with accentuation

4 Answers4