Decoding URI query string in Java

Question

I need to decode a URI that contains a query string; expected input/output behavior is something like the following:

abstract class URIParser
{       
    /** example input: 
      * something?alias=pos&FirstName=Foo+A%26B%3DC&LastName=Bar */
    URIParser(String input) { ... }
    /** should return "something" for the example input */
    public String getPath(); 
    /** should return a map 
      * {alias: "pos", FirstName: "Foo+A&B=C", LastName: "Bar"} */
    public Map<String,String> getQuery();
}

I've tried using java.net.URI, but it seems to decode the query string so in the above example I'm left with "alias=pos&FirstName=Foo+A&B=C&LastName=Bar" so there is ambiguity whether a "&" is a query separator or is a character in a query component.

Edit: I just tried URI.getRawQuery() and it doesn't do the encoding, so I can split the query string with a &, but then what do I do? Javascript has decodeURIComponent, I can't seem to find the corresponding method in Java.

Any suggestions? I would prefer not to use any new libraries.

Since you don't want to introduce new libs, may I ask in which environment you receive these URIs? — stacker, Apr 13 '10 at 19:39

score 69 · Answer 1 · edited Jun 08 '16 at 21:04

69

Use

URLDecoder.decode(proxyRequestParam.replace("+", "%2B"), "UTF-8")
          .replace("%2B", "+")

to simulate decodeURIComponent. Java's URLDecoder decodes the plus sign to a space, which is not what you want, therefore you need the replace statements.

Warning: the .replace("%2B", "+") at the end will corrupt your data if the original (pre-x-www-form-urlencoded) contained that string, as @xehpuk pointed out.

edited Jun 08 '16 at 21:04

falsarella

12,217
9
69
115

answered Aug 03 '11 at 13:12

janb

1,057
1
9
11

3

This should be the accepted answer. URIs treat the + symbol as it is, whereas spaces are encoded into %20. URLDecoder is not compatible with URI encoded strings as it will decode both + and %20 into a space. – Kosta Apr 17 '12 at 09:15
3

What's the point of the second replace? After the decode there will no longer be any instances of "%2B" in the string since they will have all been replaced with "+", so there will be nothing for the replace to match. – David Conrad Aug 16 '12 at 19:45
2

The point is that you don't want encoded characters in a decoded string. Since Java does not decode the +-sign as JavaScript does I first encode the +-sign so that it won't be touched by Java and then decode the %2B into +-sign. To be short: if I wouldn't do this the decoded URL would not contain the original +-signs (since Java would have lost them in the decoding phase). – janb Aug 21 '12 at 10:05
4

@janb - I think the second replace is unnecessary, because the `decode` method will already convert any `%2B` it finds into `+`. The first replace is necessary to stop it converting `+` into spaces. – Steve Powell Sep 11 '13 at 10:38
9

@StevePowell The second replace is not only unnecessary, it's wrong. – xehpuk Feb 17 '15 at 22:46
@xehpuk, @StevePowell: in some situations (as the one I have desribed above) it is needed, because you don't want to lose any '+' character if it is intentionally in the incomming parameter. By using the second replace, you mimic the behavior of JavaScript's `decodeURIComponent`. – janb Jun 10 '16 at 15:18
1

For example, string `"%252B"` would be decoded by your solution incorrectly as `"+"`, while `decodeURIComponent("%252B") === "%@B"`. Is there any example to demonstrate the necessity of last replace? – Franklin Yu May 23 '18 at 16:08

score 17 · Accepted Answer · edited Mar 21 '12 at 15:24

17

See class URLDecoder

edited Mar 21 '12 at 15:24

reevesy

3,452
1
26
23

answered Apr 13 '10 at 18:58

Maurice Perry

32,610
9
70
97

5

It should be noted that you should identify the query part and split the parameters into key/value pairs prior to using this, but it'll decode percent-encoded values to the given encoding (see UTF-8) according to the HTML `application/x-www-form-urlencoded` spec. – McDowell Apr 13 '10 at 22:29
5

Always put the answer in your answer. Linking out creates extra work and there's no guarantee the link will always work. – fivedogit Sep 14 '19 at 16:08

score 7 · Answer 3 · answered Dec 14 '16 at 17:15

7

var reqParam =  URLDecoder.decode(reqParam, "UTF-8")

answered Dec 14 '16 at 17:15

Bhaskara Arani

1,556
1
26
44

java7+ `URLDecoder.decode(objectKey, StandardCharsets.UTF_8));` – Mike Jan 04 '23 at 13:24
Will it work for string containing + operator? – Sritam Jagadev May 23 '23 at 18:15

score 0 · Answer 4 · answered Apr 01 '16 at 12:19

Regarding the issue with the + sign :

I made a helper class that wraps the URLDecoder function based on the answer of @janb

import android.net.Uri;
import android.support.annotation.Nullable;
import android.text.TextUtils;

import java.io.UnsupportedEncodingException;
import java.net.URLDecoder;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Locale;

public class DateDecoder {

    private static final String KEY_DATE = "datekey";

    private static final SimpleDateFormat SIMPLE_DATE_FORMAT =
            new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ssZZZZZ", Locale.US);


    public static void main(String[] args) throws UnsupportedEncodingException {
        try {
            Uri uri = Uri.parse("http://asdf.com?something=12345&" +
                    KEY_DATE +"=2016-12-24T12:00:00+01:00");

            System.out.println("parsed date: " + DateDecoder.createDate(uri)); // parsed date: Sat Dec 24 12:00:00 GMT+01:00 2016
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

    @Nullable
    public static Date createDate(@Nullable Uri data) {
        if (data != null) {
            try {
                String withPlus = decodeButKeepPlus(KEY_DATE, data.getEncodedQuery());
                if (!TextUtils.isEmpty(withPlus)) {
                    return SIMPLE_DATE_FORMAT.parse(withPlus);
                }
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
        return null;
    }

    /**
     * copied from android.net.Uri.java
     */
    @Nullable
    public static String decodeButKeepPlus(String encodedKey, String completeEncodedQuery)
            throws UnsupportedEncodingException {

        final int length = completeEncodedQuery.length();
        int start = 0;
        do {
            int nextAmpersand = completeEncodedQuery.indexOf('&', start);
            int end = nextAmpersand != -1 ? nextAmpersand : length;

            int separator = completeEncodedQuery.indexOf('=', start);
            if (separator > end || separator == -1) {
                separator = end;
            }

            if (separator - start == encodedKey.length()
                    && completeEncodedQuery.regionMatches(start, encodedKey, 0, encodedKey.length())) {
                if (separator == end) {
                    return "";
                } else {
                    String encodedValue = completeEncodedQuery.substring(separator + 1, end);
                    if (!TextUtils.isEmpty(encodedValue)) {
                        return URLDecoder.decode(encodedValue.replace("+", "%2B"), "UTF-8").replace("%2B", "+");
                    }
                }
            }

            // Move start to end of name.
            if (nextAmpersand != -1) {
                start = nextAmpersand + 1;
            } else {
                break;
            }
        } while (true);
        return null;
    }

}

vipcxj · Answer 5 · 2020-05-23T13:01:57.207

0

new java.net.URI(proxyRequestParam).getPath()

The string encoded by js encodeURIComponent should just be a path, without schema and other things. However it still a valid input for java.net.URI. So java.net.URI will do everything for us and then the path of it is what we want.

edited May 23 '20 at 13:01

answered May 11 '20 at 02:59

vipcxj

840
5
10

Decoding URI query string in Java

5 Answers5

Linked

Related