Decode % into space using URLDecoder in java?

Question

I have a use-case in which I have to decode the queryParameter of the URI and do the thing(Out of scope of this question).

Suppose I have a URI and I have to decode it. Now I know that presently all the %20 will be converted to space and while creating the URI space should be represented by %20 but there could be a case where I might get the URI with % as space. Therefore, I want to convert the % to space in order to maintain the backward compatibility. There is a note at the end which will help in understanding the question.

I tried replaceall() % with %20 but then again the %20 will become %2020 and many other exceptions are there.

This is needed for reading UPI URIs, As per official documents from NPCI:

Note: Considering that the current PSP apps are developed to read “%” as space (“ ”), the Bank PSP should support both “%” and “%20”, until such time the ecosystem is aligned to the revision. Hence, backward compatibility should be ensured.

EDIT 1 Based on pshemo comment -

I have tried

str.replaceAll("%(?![0-9a-fA-F])","%20")

A case which is not satisfying the above regex is "upi://pay?pa=praksh%40kmbl&pn=Prakash%Abmar&cu=INR"

the output is pn -> Prakash"some othercharacter"mar

"I tried replaceall() % with %20 but then again the %20 will become %2020" then maybe try to replace only `%` which doesn't have `XX` (where `X` is hexadecimal value)` after it like `replaceAll("%(?![0-9a-fA-F]{2})","%20")` — Pshemo, Nov 29 '17 at 19:22
I have tried this too...how about this case - upi://pay?pa=praksh%40kmbl&pn=Prakash%Abmar&cu=INR — Aman Verma, Nov 29 '17 at 19:29
Please see this post on how to URL decoding in Java. https://stackoverflow.com/questions/6138127/how-to-do-url-decoding-in-java — Eric, Nov 29 '17 at 19:30
@rockstar value after % can be a hexadecimal value and characters after % can be intentional as well — Rahul Tiwari, Aug 28 '18 at 08:08
@RahulTiwari: Perhaps capture `(%[0-9A-F]{2})+` in the string, run a UTF-8 conversion on the decoded byte, and pick out the bytes where error occurs to treat the % as spaces. Anyway, this solution assumes the source of the URL doesn't have any bugs, or the result could become a gibberish mess. — nhahtdh, Aug 28 '18 at 08:20
How do you know that in your example %ab is a hexadecimal value or not? — tak3shi, Aug 28 '18 at 10:30
@RahulTiwari You have to get the encoded query first and split it with & , for pn and tn uses the regex i suggested and for tr and tid use str.replaceAll("[^A-Za-z0-9]","") — Albin Mathew, Aug 31 '18 at 10:02

score 1 · Answer 1 · answered Aug 28 '18 at 11:51

Probably is not the answer that you want, but this may help:

public class Test {

    public static void main(String... a) {
        try {
            //
            String u = "upi://pay?pa=praksh%40kmbl&pn=Prakash%Abmar&cu=INR";
            System.out.println(decode(u));
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    private static String decode(String in) {
        StringBuilder sb = new StringBuilder();
        for (int i = 0; i < in.length(); i++) {
            char c = in.charAt(i);
            if (c == '%') {
                int decoded = Integer.parseInt(in.substring(i + 1, i + 3), 16);
                if (decoded >= 32 && decoded <= 126) { //Possible valid char
                    sb.append((char) decoded);
                    i += 2;
                } else { //not a valid char... maybe a space
                    sb.append(" ");
                }
            } else if (c == '+') {
                sb.append(" ");
            } else {
                sb.append(c);
            }
        }

        return sb.toString();
    }
}

There are many possibilities, so probably you will need a "custom" solution. The above code cover some cases.

Roland · Answer 2 · 2018-09-03T09:03:35.900

Interesting problem. You can't replace the % to a space reliably as you saw yourself already. You need additional information about what will be transported via the uri and then narrow down to what must be replaced and what not, e.g.

%ZTest -> a space for sure
%Abababtest -> is it a space? probably... but we need to be sure that no strange characters or sequences are allowed
%23th%Affleck%20Street -> space? hex? what is what?

You need some more information to solve that issue reliably, like:

which are the allowed symbols? or which are the allowed hex-ranges to be decoded?
which query parameters are the ones to contain % as spaces? (so you may transform only them)
do you need to decode cyrillic, arabic, chinese characters too?
if a %20 is in the URI, can we assume that no % will be a space then? or is it possible that both appear as space in the URI?

With that additional information it should be easier to solve the issue.

Here is a solution nonetheless that might get you in the right direction (but please consider the warnings at the bottom!):

Pattern HEX_PATTERN = Pattern.compile("(?i)%([A-F0-9]{2})?");
String CHARSET = "utf-8";
String ENCODED_SPACE = "%20";
String ALLOWED_SYMBOLS = "\\p{L}|\\s|@";

String semiDecode(String uri) throws UnsupportedEncodingException {
    Matcher m = HEX_PATTERN.matcher(uri);
    StringBuffer semiDecoded = new StringBuffer();
    while (m.find()) {
        String match = m.group();
        String hexString = m.group(1);
        String replacementString = match;
        if (hexString == null) {
            replacementString = ENCODED_SPACE;
        } else {
// alternatively to the following just check whether the hex value is in an allowed range... 
// you may want to lookup https://en.wikipedia.org/wiki/List_of_Unicode_characters for this
            String decodedSymbol = URLDecoder.decode(match, CHARSET);
            if (!decodedSymbol.matches(ALLOWED_SYMBOLS)) {
                replacementString = ENCODED_SPACE + hexString;
            }
        }
        m.appendReplacement(semiDecoded, replacementString);
    }
    m.appendTail(semiDecoded);
    return semiDecoded.toString();
}

Sample usage:

String uri = "upi://pay?pa=praksh%40kmbl&pn=Prakash%Abmar&cu=INR";
String semiDecoded = semiDecode(uri);
System.out.println("Input: " + uri);
System.out.println("Semi-decoded: " + semiDecoded);
System.out.println("Completely decoded query: " + new URI(semiDecoded).getQuery());

which will print:

Input: upi://pay?pa=praksh%40kmbl&pn=Prakash%Abmar&cu=INR
Semi-decoded: upi://pay?pa=praksh%40kmbl&pn=Prakash%20Abmar&cu=INR
Completely decoded query: pa=praksh@kmbl&pn=Prakash Abmar&cu=INR

Warnings... some things to keep in mind:

this specific implementation does not work with cyrillic, chinese or other letters which take up more then 2 hex values (i.e. %##%## or %##%##%## for single characters will not be decoded anymore)
you need to adapt the allowed symbols to your needs (see regex of ALLOWED_SYMBOLS; for now it accepts any letter, any whitespace and @)
charset utf-8 was assumed

score 0 · Answer 3 · answered Sep 06 '18 at 11:21

The solution I used for this is to not use name of payee provided in QR and query the PSP with vpa to get the correct name. this way you will also make sure that the payee exists.

for example:

given QR has URI as upi://pay?pa=someone@upi&pn=firstname%lastname&cu=INR
extract pa which is someone@upi and use it to get name of user from PSP
as anything apart from name and note can not have % or %20 in it, simply use any of the workarounds provided in other answers or use simpler solutions for notes as notes usually are less important.

Decode % into space using URLDecoder in java?

3 Answers3