3

I'm involved in writing a (Java/Groovy) browser-automation app with Selenium 2 and FireFox driver.

Currently there is an issue with some URLs we find in the wild that are apparently using bad URI syntax. (specifically curly braces ({}), |'s and ^'s).

String url = driver.getCurrentUrl(); // http://example.com/foo?key=val|with^bad{char}acters

When trying to construct a java.net.URI from the string returned by driver.getCurrentUrl() a URISyntaxException is thrown.

new URI(url); // java.net.URISyntaxException: Illegal character in query at index ...

Encoding the whole url before constructing the URI will not work (as I understand it).

The whole url is encoded, and it doesn't preseve any pieces of it that I can parse in any normal fashion. For example, with this uri-safe string, URI can't know the difference between a & as the query-string-param delimeter or %26 (its encoded value) in the content of a single qs-param.

String encoded = URLEncoder.encode(url, "UTF-8") // http%3A%2F%2Fexample.com%2Ffoo%3Fkey%3Dval%7Cwith%5E%7Cbad%7Ccharacters
URI uri = new URI(encoded)
URLEncodedUtils.parse(uri, "UTF-8") // []

Currently the solution is, before constructing the URI, running the following (groovy) code:

["|", "^", "{", "}"].each {
    url = url.replace(it, URLEncoder.encode(it, "UTF-8"))
}

But this seems dirty and wrong.

I guess my question is multi-part:

  1. Why does FirefoxDriver return a String rather than a URI?
  2. Why is this String malformed?
  3. What is best practice for dealing with this kind of thing?
Zach Lysobey
  • 14,959
  • 20
  • 95
  • 149
  • It's not clear - do you really have those bad characters in URL and you expect them? Could you add an example of a string you expect to see and you actually get? Thanks. – sap1ens Apr 07 '15 at 16:34
  • Due to the nature of the project, I cannot confirm if the *actual* urls are different from what WebDriver is reporting, nor am I able to share too many specifics, but `http://example.com/foo?key=val-with-a-|-in-it` __is__ indicative of what we actually see returned from `driver.getCurrentUrl()` in rare cases. – Zach Lysobey Apr 07 '15 at 17:38
  • another example I just dug up: `http://example.com?foo={bar}` – Zach Lysobey Apr 07 '15 at 17:54

4 Answers4

2

We can partially encode query string parameters, as discussed in comments, it should work.

Other way is to use galimatias library:

import io.mola.galimatias.GalimatiasParseException;
import io.mola.galimatias.URL;

import java.net.URI;
import java.net.URISyntaxException;

public class Main {

    public static void main(String[] args) throws URISyntaxException {
        String example1 = "http://example.com/foo?key=val-with-a-|-in-it";
        String example2 = "http://example.com?foo={bar}";

        try {
            URL url1 = URL.parse(example1);
            URI uri1 = url1.toJavaURI();
            System.out.println(url1);
            System.out.println(uri1);

            URL url2 = URL.parse(example2);
            URI uri2 = url2.toJavaURI();
            System.out.println(url2);
            System.out.println(uri2);
        } catch (GalimatiasParseException ex) {
            // Do something with non-recoverable parsing error
        }
    }
}

Output:

http://example.com/foo?key=val-with-a-|-in-it
http://example.com/foo?key=val-with-a-%7C-in-it
http://example.com/?foo={bar}
http://example.com/?foo=%7Bbar%7D
sap1ens
  • 2,877
  • 1
  • 27
  • 30
0

driver.getCurrentUrl() gets a string from the browser and before making it into an URL, you should URL encode the string.

See Java URL encoding of query string parameters for an example of this in Java.

Community
  • 1
  • 1
d3ming
  • 8,496
  • 5
  • 31
  • 33
  • It's my understanding that doing this will turn `http://example.com/foo?key=val|with^|bad|characters` into `http%3A%2F%2Fexample.com%2Ffoo%3Fkey%3Dval%7Cwith%5E%7Cbad%7Ccharacters`, which isn't the behaviour I'm looking for. I'll test this just to be sure, but I think that URI will not take this string in its contructor – Zach Lysobey Mar 12 '15 at 18:28
  • Yea, as suspected, this will not work. If I create a `URI` with this encoded String, I cannot parse it or do anything with it. I'll edit the question to make this more clear. – Zach Lysobey Mar 12 '15 at 18:37
  • @ZachL I think this answer is correct, but you should encode only query string part. In your question you show that you encode the whole URL, which is incorrect. – sap1ens Apr 07 '15 at 17:46
  • hmmm... well the answer seems to imply that I should encode the whole string, which is wrong. I could perhaps take the whole query string after the `?`, split this on the `&`'s and `=`s, encode those chunks, and put the url back together. If you want to write this up as an answer (or @dming wants to edit his answer as such) I will mark that as correct and award bounty if no better solution appears. – Zach Lysobey Apr 07 '15 at 18:01
0

Will this work for you?

import java.net.URI;
import java.net.URL;
import java.net.URLEncoder;


public class Sample {

public static void main(String[] args) throws UnsupportedEncodingException {
    String urlInString="http://example.com/foo?key=val-with-a-{-in-it";
    String encodedURL=URLEncoder.encode(urlInString, "UTF-8");

    URI encodedURI=URI.create(encodedURL);
    System.out.println("Actual URL:"+urlInString);
    System.out.println("Encoded URL:"+encodedURL);
    System.out.println("Encoded URI:"+encodedURI);

}

}

Output:

Actual URL:http://example.com/foo?key=val-with-a-{-in-it Encoded URL:http%3A%2F%2Fexample.com%2Ffoo%3Fkey%3Dval-with-a-%7B-in-it Encoded URI:http%3A%2F%2Fexample.com%2Ffoo%3Fkey%3Dval-with-a-%7B-in-it

Rameshwar
  • 541
  • 6
  • 22
  • Doesn't compile (replace uri1 with urlInString), also URLEncoder.encode(String) is deprecated, you need to use URLEncoder.encode(String, "UTF-8") – sap1ens Apr 09 '15 at 13:40
  • Thanks you @sap1ens. Updated the code according to your suggestions. Code should compile now. – Rameshwar Apr 09 '15 at 13:47
  • How is this different than the approach in my question where I use `URLEncoder`? This looks like the same exact thing, with the same issue. – Zach Lysobey Apr 09 '15 at 14:30
  • 1
    Your approach `URI encodedURI=new URI(uri1); ` is generating the error `// java.net.URISyntaxException: Illegal character in query at index ...` . But I used `URI encodedURI=URI.create(encodedURL);` which is giving the output that I have shown. Is that output that you were looking for? – Rameshwar Apr 09 '15 at 14:33
  • 1
    This has the same problem as [@dming's answer](http://stackoverflow.com/a/29017513/363701), and my example using `URLEncoder.encode(url, "UTF-8")` (where no error occurs). In this case, I cannot analyze the URI like I need to. For example: I need to be able to do `uri.getQuery()` and retrieve the query string as it is in the wild: `?key=val-with-a-{-in-it`. With the *galimatias* library I can get it as such. In your code this returns `null`. When `http://` gets encoded it becomes `http%3A%2F%2F` making it not parse correctly as a URI. The same is true for `?`, `&` and `=` in the query string. – Zach Lysobey Apr 09 '15 at 18:11
  • @ZachL Does your domain name (I mean `http://example.com/`) remain constant always? – Rameshwar Apr 10 '15 at 06:45
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/74939/discussion-between-rameshwar-and-zach-l). – Rameshwar Apr 10 '15 at 13:40
0

Another Solution is to split the URL fetched and then use them to create the URL you want. This will ensure that you get all the features of URL class.

import java.io.UnsupportedEncodingException;
import java.net.MalformedURLException;
import java.net.URI;     
import java.net.URISyntaxException;      
import java.net.URL;

public class Sample {

public static void main(String[] args) throws UnsupportedEncodingException,
        URISyntaxException, MalformedURLException {
    String uri1 = "http://example.com/foo?key=val-with-a-{-in-it";

    String scheme=uri1.split(":")[0];

    String authority=uri1.split("//")[1].split("/")[0];

    String path=uri1.split("//")[1].split("/")[1].split("\\?")[0];  

    String query=uri1.split("\\?")[1];  


    URI uri = null;
    uri = new URI(scheme, authority, "/"+path, query,null);

    URL url = null;

    url = uri.toURL();

    System.out.println("URI's Query:"+uri.getQuery());
    System.out.println("URL's Query:"+url.getQuery());

}

}
Rameshwar
  • 541
  • 6
  • 22
  • this is definitely a possible approach, but a bit more heavy handed than I'd like, and I think I'd default to using the `replace` method in my original post. For this code to work, I'd have to make it significantly more robust to deal with things that may not have a path or query string. – Zach Lysobey Apr 10 '15 at 13:36