-1

I have a hexadecimal string and I am trying to convert it back into a utf-8 encoded string.

Example:

String hexString = "6a6f65";

How do I convert that string above back into "joe"

Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
NotAidan
  • 11
  • 3
  • From the top of my head: 1) take chunks of two characters from string; 2) parse as hexadecimal `int`s (there's a version of `parseInt` that accepts a radix argument, pass `16`), 3) convert to char, 4) reassemble string. – Federico klez Culloca Aug 13 '21 at 18:59
  • Very quick and dirty and not properly tested - you can do that ;) `String s = new String(new BigInteger(hexString, 16).toByteArray()); ` – g00se Aug 13 '21 at 19:02

2 Answers2

0

If you can be sure that the hex string comes from a byte array of a properly UTF-8 encoded string, all you need to do is:

  1. Convert the hex string back into a byte array.
  2. Convert the byte array further back into a string, with correct encoding of course.

For the first part, there's a range of ways to do it. Just see this question and pick one that suits your needs.

Once you get the byte array back from the hex string, do this:

String s = new String(bytearr, StandardCharsets.UTF_8);
Mark Rotteveel
  • 100,966
  • 191
  • 140
  • 197
Miigon
  • 779
  • 6
  • 18
-1

You cannot do so reliably.

Unicode characters may be encoded at any code point from U+0000 to U+10FFFF.

So there is no way for us to know how many characters at a time in your input should be parsed as the hexadecimal number of a Unicode code point.

Substring > code point integer > StringBuilder#appendCodePoint > String

If you know for certain the input should be parsed two characters at a time, use String#substring to retrieve each pair of characters. Parse each pair using Integer.parse.

int codePoint = Integer.parseInt( hexInput ,16 ) ;

Build up your results by using StringBuilder#appendCodePoint.

String hexString = "6a6f65";
StringBuilder builder = new StringBuilder();
for ( int i = 0 ; i < hexString.length() ; i += 2 ) {
    String substring = hexString.substring( i , i + 2 );
    int codePoint = Integer.parseInt( substring , 16 );
    builder.appendCodePoint( codePoint );
}
String result = builder.toString();

See this code run live at IdeOne.com.

result = joe

Caveat: If such inputs are coming from UTF-8 encoded text, this approach is not reliable. Such text may use 1, 2, 3, or 4 octets of data to represent any one character. If your input is indeed UTF-8 encoded text, then you should parse it as such.

Streams

Not that I recommend doing so in this case, but you could use streams.

StringBuilder builder = new StringBuilder();
String input = "6a6f65";
IntStream.iterate( 0 , ( x ) -> x < input.length() , i -> i + 2 ).forEach( i -> builder.appendCodePoint( Integer.parseInt( input.substring( i , i + 2 ) , 16 ) ) );
System.out.println( "builder = " + builder );

builder = joe

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
  • I got the hexadecimal from a utf-8 encoded byte array so i know where it would come from. – NotAidan Aug 13 '21 at 19:06
  • 1
    Actually, it can be done reliably. No UTF encodings actually stores code points like that (With variable length code point and no length indication of any sort).Since doing it this way will make it impossible for any program to decode, making it pretty much useless. – Miigon Aug 13 '21 at 19:36
  • Correct. UTF-8 has the number of bytes used encoded in its high bits – g00se Aug 13 '21 at 19:43