0

How to decode String which contains characters like 'Total\x20Value' my actual value is 'Total Value'

Using javascript it is getting decoded by the browser like:

if I write on browser console:

var a = 'Total\x20Value';

then I print a then it will print 'Total Value' mean browser decoded this string automatically

Now my question is how can I do this in Java code, I want this string to be decoded in java code but I am not getting a way to decode it in Java. One more thing I can not go for string replace solution for this case because the given string only contains a space char but at run time I will get different characters so I need a generic solution in which I can decode any string without any replace operation.

One more string example is :

"DIMENSION\x5f13420895086619127059036175667828\x7e\x24\x7e1\x7e\x24\x7e1"

its real string is :

"DIMENSION_13420895086619127059036175667828~$~1~$~1".

Suggest something If it can be achieved in Java using some predefined class I have gone through with many solutions but nothing worked for me.

Ahmet Emre Kilinc
  • 5,489
  • 12
  • 30
  • 42
Krishna Verma
  • 814
  • 2
  • 8
  • 23
  • This is looks like an url encoding, try [this](https://stackoverflow.com/a/6138183/8119498) – sf_ Jun 21 '17 at 06:31
  • @Leon, How do you decode `\x24%24` to `$%24` using URLDecoder? –  Jun 21 '17 at 06:45
  • @saka1029 oops, my bad – Leon Jun 21 '17 at 06:47
  • Can I ask what is generating strings in this shape? Can I also ask whether you would need to decode strings of the form `\uNNNN` for four hex digits `NNNN` as well as just `\xNN`? – Luke Woodward Jun 21 '17 at 09:22

3 Answers3

0

I suspect that a better way to address the problem you have is to fix the way these strings are created, so they don't have substrings such as \x20 or \x7e to start off with.

However, these strings could well be coming from a third-party API which you might not have any control over. If that's the case, the following method should help. It takes the string value you want to decode, containing such substrings, and replaces them with the appropriate characters:

import java.util.regex.*;

// ...

private static String decode(String input) {
    Pattern p = Pattern.compile("\\\\x[0-9A-Fa-f]{2}");
    Matcher m = p.matcher(input);
    StringBuffer sb = new StringBuffer();
    while (m.find()) {
        String matchedText = m.group(0);
        int characterCode = Integer.parseInt(matchedText.substring(2), 16);
        m.appendReplacement(sb,
            Matcher.quoteReplacement(Character.toString((char)characterCode)));
    }

    m.appendTail(sb);
    return sb.toString();
}

There are a few things to note about it:

  • The overall structure of this code is based on example code in the Matcher documentation.

  • A regexp to match a substring of the form \x24 or \x7e is \\x[0-9A-Fa-f]{2}. Note that we have to double the backslash \ because \ has special meaning in regular expressions and we want to match an actual \ character. However, \ also has a special meaning in Java string literals so we need to double it again.

  • We need to use Matcher.quoteReplacement to ensure that the string we are replacing with is interpreted as that string and nothing else. In the replacement string, $1 for example will be interpreted as the first matched group, and $ on its own will cause an exception to be thrown. (Fortunately, your second example string contained $ characters - without those I may well have missed this.)

  • You may want to consider moving the Pattern to a static final constant somewhere, to avoid the regular expression being compiled every time the method is called.

Luke Woodward
  • 63,336
  • 16
  • 89
  • 104
0

Those \xNN substrings are just the hexadecimal ASCII code of the encoded character. You can find such an ASCII table here.

You can create your own map which holds the mapping hexadecimal to character and use it to manipulate your strings. Example:

import java.util.HashMap;
import java.util.Map;

public class NewClass {
    public static void main(String[] args){
        String str1 = "Total\\x20Value";
        String str2 = "DIMENSION\\x5f13420895086619127059036175667828\\x7e\\x24\\x7e1\\x7e\\x24\\x7e1"; 
        System.out.println(decode(str1));
        System.out.println(decode(str2));
    }
    public static String decode(String str){
        Map<String,String> map = new HashMap<>();
        //you can extend this to x<256 if you expect your strings to contain special characters like (Ã,Ç,Æ,§,¾ ...) 
        for(int i = 0; i< 128; i++){
            map.put((i<16?"\\x0":"\\x")+Integer.toHexString(i), Character.toString((char)i));            
        }

        for(String key: map.keySet()){
            if(str.contains(key)){
                str = str.replace(key, map.get(key));
            }
        }
        return str;
    }
}
Eritrean
  • 15,851
  • 3
  • 22
  • 28
  • Actually, [\xNN](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String) is the encoded value of an ISO 8859-1 character. And all string values in JavaScript are counted sequences of UTF-16 code units (of the [Unicode](http://www.unicode.org/charts/nameslist/index.html) character set), so clearer if you write \uNNNN (or \u{NN} and the like), instead. Oh, the \u{NN} is for JavaScript, but is for both Java and JavaScript \uNNNN (after all, most modern languages have UTF-16 strings). – Tom Blodget Jun 21 '17 at 16:51
0

This solution involves conversion of Regular expression to Hexadecimal String and then getting the equivalent ASCII value representing the Hexadecimal String. Finally replacing the Hex string with equivalent ASCII character.

    public static void main(String[] args) {

    String input = "Total\\x20Value\\x7e";  

    String modifiedInput = input.replace("\\", "0");


     for (int i = 0 ; i<modifiedInput.length() ; i++)
     {  
            if(i<=modifiedInput.length()-3){
            if (modifiedInput.charAt(i) == '0' && modifiedInput.charAt(i+1)=='x')
            {
                String subString = modifiedInput.substring(i, i+4) ;

                String ascii = convert(subString);

            modifiedInput = modifiedInput.replace(subString.toString(), ascii);

            }
            }   

     }

     System.out.println(modifiedInput);


    }

    public static String convert(String hexDigits){
       // byte[] bytes = new byte[hexDigits.length];

        byte[] bytes = new byte[1];

            bytes[0] = Integer.decode(hexDigits).byteValue();

        String result;
        result = new String(bytes);
        return result;
    }

}