2

Currently incorporating the URLEncoder and URLDecoder into some code. There are numerous URLs already saved that will get processed by the URLDecoder routine that was not initially processed by the URLEncoder routine.

Based on some testing it doesn't appear there will be an issue, but granted I have not tested all the scenarios.

I did notice some characters like the / which would normally get encoded are processed just find by the decoding routine even if not initially encoded.

This lead me to an oversimplified analysis. It appears the URLDecoder routine essentially checks the URL for a % and the next 2 bytes (provided UTF-8 is used). As long as there aren't any % within the previously saved off URLs then there shouldn't be an issue when processed by the URLDecoder routine. Does that sound about right?

Evan Williams
  • 164
  • 1
  • 5
Unhandled Exception
  • 1,427
  • 14
  • 30

1 Answers1

2

Yes, while it will work for "simple" cases, you might encounter a) exceptions or b) unexpected behaviour if calling URLDecoder.decode for an unencoded URL that contains certain special chars.

Consider the following example: It will throw a java.lang.IllegalArgumentException: URLDecoder: Incomplete trailing escape (%) pattern for the third test and it will alter the URL without exception for the second test (while the regular encoding/decoding works without issues):

import java.net.URLDecoder;
import java.net.URLEncoder;

public class Test {
    public static void main(String[] args) throws Exception {
        test("http://www.foo.bar/");
        test("http://www.foo.bar/?q=a+b");
        test("http://www.foo.bar/?q=äöüß%"); // Will throw exception
    }

    private static void test(String url) throws Exception {
        String encoded = URLEncoder.encode(url, "UTF-8");
        String decoded = URLDecoder.decode(encoded, "UTF-8");
        System.out.println("encoded: " + encoded);
        System.out.println("decoded: " + decoded);
        System.out.println(URLDecoder.decode(decoded, "UTF-8"));
    }
}

Output (notice how the + sign disappears):

encoded: http%3A%2F%2Fwww.foo.bar%2F
decoded: http://www.foo.bar/
http://www.foo.bar/
encoded: http%3A%2F%2Fwww.foo.bar%2F%3Fq%3Da%2Bb
decoded: http://www.foo.bar/?q=a+b
http://www.foo.bar/?q=a b
encoded: http%3A%2F%2Fwww.foo.bar%2F%3Fq%3D%C3%A4%C3%B6%C3%BC%C3%9F%25
decoded: http://www.foo.bar/?q=äöüß%
Exception in thread "main" java.lang.IllegalArgumentException: URLDecoder: Incomplete trailing escape (%) pattern
    at java.net.URLDecoder.decode(Unknown Source)
    at Test.test(Test.java:16)

See the javadoc of URLDecoder for the two cases as well:

  • The plus sign "+" is converted into a space character " " .
  • A sequence of the form "%xy" will be treated as representing a byte where xy is the two-digit hexadecimal representation of the 8 bits. Then, all substrings that contain one or more of these byte sequences consecutively will be replaced by the character(s) whose encoding would result in those consecutive bytes. The encoding scheme used to decode these characters may be specified, or if unspecified, the default encoding of the platform will be used.

If you are sure that your unencoded URLs do not contain + or % then I'd say it's safe to call URLDecoder.decode. Otherwise I'd advise to implement additional checks, e.g. try to decode and compare with the original (cf. this question on SO).

Community
  • 1
  • 1
Marvin
  • 13,325
  • 3
  • 51
  • 57