Interesting problem. You can't replace the %
to a space reliably as you saw yourself already. You need additional information about what will be transported via the uri and then narrow down to what must be replaced and what not, e.g.
%ZTest -> a space for sure
%Abababtest -> is it a space? probably... but we need to be sure that no strange characters or sequences are allowed
%23th%Affleck%20Street -> space? hex? what is what?
You need some more information to solve that issue reliably, like:
- which are the allowed symbols? or which are the allowed hex-ranges to be decoded?
- which query parameters are the ones to contain
%
as spaces? (so you may transform only them)
- do you need to decode cyrillic, arabic, chinese characters too?
- if a
%20
is in the URI, can we assume that no %
will be a space then? or is it possible that both appear as space in the URI?
With that additional information it should be easier to solve the issue.
Here is a solution nonetheless that might get you in the right direction (but please consider the warnings at the bottom!):
Pattern HEX_PATTERN = Pattern.compile("(?i)%([A-F0-9]{2})?");
String CHARSET = "utf-8";
String ENCODED_SPACE = "%20";
String ALLOWED_SYMBOLS = "\\p{L}|\\s|@";
String semiDecode(String uri) throws UnsupportedEncodingException {
Matcher m = HEX_PATTERN.matcher(uri);
StringBuffer semiDecoded = new StringBuffer();
while (m.find()) {
String match = m.group();
String hexString = m.group(1);
String replacementString = match;
if (hexString == null) {
replacementString = ENCODED_SPACE;
} else {
// alternatively to the following just check whether the hex value is in an allowed range...
// you may want to lookup https://en.wikipedia.org/wiki/List_of_Unicode_characters for this
String decodedSymbol = URLDecoder.decode(match, CHARSET);
if (!decodedSymbol.matches(ALLOWED_SYMBOLS)) {
replacementString = ENCODED_SPACE + hexString;
}
}
m.appendReplacement(semiDecoded, replacementString);
}
m.appendTail(semiDecoded);
return semiDecoded.toString();
}
Sample usage:
String uri = "upi://pay?pa=praksh%40kmbl&pn=Prakash%Abmar&cu=INR";
String semiDecoded = semiDecode(uri);
System.out.println("Input: " + uri);
System.out.println("Semi-decoded: " + semiDecoded);
System.out.println("Completely decoded query: " + new URI(semiDecoded).getQuery());
which will print:
Input: upi://pay?pa=praksh%40kmbl&pn=Prakash%Abmar&cu=INR
Semi-decoded: upi://pay?pa=praksh%40kmbl&pn=Prakash%20Abmar&cu=INR
Completely decoded query: pa=praksh@kmbl&pn=Prakash Abmar&cu=INR
Warnings... some things to keep in mind:
- this specific implementation does not work with cyrillic, chinese or other letters which take up more then 2 hex values (i.e.
%##%##
or %##%##%##
for single characters will not be decoded anymore)
- you need to adapt the allowed symbols to your needs (see regex of
ALLOWED_SYMBOLS
; for now it accepts any letter, any whitespace and @
)
- charset utf-8 was assumed