0

When passing a URI query which contains # the function seems to stop iterating the query and returns up to just before the #

Example URI: /test.php?hello=Hello+World ljlksjlkdja(#*!!!!()**!&world=Venus

Will output: Hello World ljlksjlkdja(

Expected Output: Hello World ljlksjlkdja(#*!!!!()**! from Venus

I have tried replacing the pound signs with query.replace("#", "%23") after tokens but the problem persists so I am not sure.

Main function it's based on can be found here: Parse a URI String into Name-Value Collection

Alternately I noticed the author mentioning this would work on arrays, but it only captures the first result from something like ?hello=Hello+World&world[]=Venus&world[]=Mars&world[]=Eartth which outputs an array [world]=>Array([0] => Venus)

private static Map<String, List<String>> splitQuery(String query) throws UnsupportedEncodingException {
    final Map<String, List<String>> query_pairs = new LinkedHashMap<String, List<String>>();
    String[] tokens = query.split("\\?", 2);
    if (tokens.length == 2) {
        query = tokens[1];
        final String[] pairs = query.split("&");
        for (String pair : pairs) {
            final int idx = pair.indexOf("=");
            final String key = idx > 0 ? URLDecoder.decode(pair.substring(0, idx), "UTF-8") : pair;
            if (!query_pairs.containsKey(key)) {
                query_pairs.put(key, new LinkedList<String>());
            }
            final String value = idx > 0 && pair.length() > idx + 1 ? URLDecoder.decode(pair.substring(idx + 1), "UTF-8") : null;
            query_pairs.get(key).add(value);
        }
    }
    return query_pairs;
}
Community
  • 1
  • 1

1 Answers1

1

Your code and the example you provide compiles and runs for me:

System.out.print(splitQuery("asdfasdfadsf?hello=Hello+World"));
System.out.print(splitQuery("asdfasdfadsf?hello=Hello%20World"));
# output: {hello=[Hello World]}{hello=[Hello World]}

One suggestion would be to use split() to find key=value pairs instead of manually splitting based on character indexes.

Even better, I'd consider using a third party library to do this work, as suggested in Parse a URI String into Name-Value Collection.


Updated to address question updates

The # character in a URL introduces a fragment identifier. According to RFC 3986, Section 3.5: Fragment:

A fragment identifier component is indicated by the presence of number sign ("#") character and terminated by the end of the URI.

Thus, it makes sense that query parameter processing ends when the # character has been encountered. In order to accept these kinds of characters in your query parameters, they must be encoded. # is encoded as %23, but this encoding must occur before you actually send the request to the server. Using your example, the following should work as you intend:

/test.php?hello=Hello%2BWorld%20ljlksjlkdja(%23*!!!!()**!&world=Venus

See also Which characters make a URL invalid? for a discussion on valid URL characters.

Community
  • 1
  • 1