1

I am trying to decode a string which may contain multiple UTF8(hex) encoding like this:

"IMPU=H\u’C3A9’tm\u’C3A9’rf\u’C3B6’ldescsizma,AC=IMPU,AC=C-NTDB". 

I want to decode below string into a meaningful string.

I tried this :

String hex = "H\\u’C3A9’tm\\u’C3A9’rf\\u’C3B6’ldescsizma,DC=IMPU,DC=C-NTD‌​B"; 
ByteBuffer buff = ByteBuffer.allocate(hex.length()/2); 
for (int i = 0; i < hex.length(); i+=2) {
    buff.put((byte)Integer.parseInt(hex.substring(i, i+2), 16)); 
} 
buff.rewind(); 
Charset cs = Charset.forName("UTF-8"); 
CharBuffer cb = cs.decode(buff);
System.out.println(cb.toString());

Don't know how to proceed further, please let me know if anybody has any idea.

Ole V.V.
  • 81,772
  • 15
  • 137
  • 161
TeamZ
  • 343
  • 6
  • 15
  • Provide a example: http://stackoverflow.com/help/mcve – GAlexMES Jan 17 '17 at 06:27
  • take an example i am getting a string "H\u’C3A9’ll\u’C3A9’Ow " from xml , i need to convert it to "HélléOw" . – TeamZ Jan 17 '17 at 06:38
  • Provide example code, what do you have already? – GAlexMES Jan 17 '17 at 06:42
  • i have this code String hex = "H\\u’C3A9’tm\\u’C3A9’rf\\u’C3B6’ldescsizma,DC=IMPU,DC=C-NTDB"; ByteBuffer buff = ByteBuffer.allocate(hex.length()/2); for (int i = 0; i < hex.length(); i+=2) { buff.put((byte)Integer.parseInt(hex.substring(i, i+2), 16)); } buff.rewind(); Charset cs = Charset.forName("UTF-8"); CharBuffer cb = cs.decode(buff); System.out.println(cb.toString()); – TeamZ Jan 17 '17 at 06:52
  • Edit your question and add the **formated** code to it. Nobody will read this. – GAlexMES Jan 17 '17 at 06:54
  • Are you sure your input text uses `’` ([RIGHT SINGLE QUOTATION MARK (U+2019)](http://www.fileformat.info/info/unicode/char/2019/index.htm)), and not the regular `'` ([APOSTROPHE (U+0027)](http://www.fileformat.info/info/unicode/char/0027/index.htm))? – Andreas Jan 17 '17 at 07:25
  • well this is the text i am getting from the xml. – TeamZ Jan 18 '17 at 06:11

1 Answers1

1

Here is one way to do it:

String input = "IMPU=H\\u’C3A9’tm\\u’C3A9’rf\\u’C3B6’ldescsizma,AC=IMPU,AC=C-NTDB";

StringBuffer buf = new StringBuffer();
Matcher m = Pattern.compile("\\\\u’([0-9A-F]{4}(?:[0-9A-F]{2}){0,2})’").matcher(input);
while (m.find()) {
    byte[] utf8bytes = javax.xml.bind.DatatypeConverter.parseHexBinary(m.group(1));
    m.appendReplacement(buf, new String(utf8bytes, StandardCharsets.UTF_8));
}
String output = m.appendTail(buf).toString();

System.out.println(input);
System.out.println(output);

* Use of DatatypeConverter taken from this SO answer.

Output

IMPU=H\u’C3A9’tm\u’C3A9’rf\u’C3B6’ldescsizma,AC=IMPU,AC=C-NTDB
IMPU=Hétmérföldescsizma,AC=IMPU,AC=C-NTDB
Community
  • 1
  • 1
Andreas
  • 154,647
  • 11
  • 152
  • 247