0

I'm trying to resolve a relative link that starts with a question mark ? using Java's URL or URI classes.

HTML example:

<a href="?test=xyz">Test XYZ</a>

Code examples (from Scala REPL):

import java.net._

scala> new URL(new URL("http://abc.com.br/index.php?hello=world"), "?test=xyz").toExternalForm()
res30: String = http://abc.com.br/?test=xyz

scala> (new URI("http://abc.com.br/index.php?hello=world")).resolve("?test=xyz").toString
res31: java.net.URI = http://abc.com.br/?test=xyz

The problem is that browsers (tested on Chrome, Firefox and Safari) output the following URL instead: http://abc.com.br/index.php?hello=world. It doesn't discard the path "index.php". It just replaces the query string part.

And it seems that browsers are just following the especification as explained in https://stackoverflow.com/a/7872230/40876.

Jsoup library makes the same "mistake" when we use element.absUrl("href") as it also depends on java's URL resolving.

So what's up with java's URL/URI resolving relative paths? Is it wrong/incomplete? How to make it behave the same as the browsers implementation?

Community
  • 1
  • 1
Felipe Hummel
  • 4,674
  • 5
  • 32
  • 35
  • Similar questions but without conclusive answers: http://stackoverflow.com/questions/22203111/is-javas-uri-resolve-incompatible-with-rfc-3986-when-the-relative-uri-contains?rq=1 and http://stackoverflow.com/questions/10330138/java-net-uri-resolve-against-only-query-string – Felipe Hummel Oct 13 '15 at 22:43
  • I gave a detailed answer to http://stackoverflow.com/q/22203111 (but won't comment on its conclusiveness). – William Price Mar 11 '16 at 23:49
  • Its a bug in Java's `URI` class. `URL` class has the same bug - though its a different implementation. Bug reports exist for both issues. For the `URL` class the bug was closed as "won't fix" due to legacy issues - according to Oracle `URL` is legacy and should not be used. A bug report for `URI` is still open here: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8218962 – Guss Feb 14 '19 at 07:08

1 Answers1

0

This will work just fine:

public static void main(String[] args) throws Exception {
    String base = "http://abc.com.br/index.php?hello=world";
    String relative = "?test=xyz";

    System.out.println(new URL(new URL(base), relative).toExternalForm());
    // http://abc.com.br/?test=xyz

    System.out.println((new URI(base)).resolve(relative).toString());
    // http://abc.com.br/?test=xyz

    System.out.println(org.apache.http.client.utils.URIUtils.resolve(new URI(base), relative).toString());
    // http://abc.com.br/index.php?test=xyz
}

URIUtils live in org.apache.httpcomponents:httpclient version 4.0 or higher.

ursa
  • 4,404
  • 1
  • 24
  • 38
  • I've posted simple workaround code in https://stackoverflow.com/a/61578016/53538 so you don't need to add a large dependency just to workaround a Java bug. – Guss May 03 '20 at 16:36
  • if you "don't want 3rd-party dependency" - just copy-paste working code from that library into your project. why reinvent a wheel? – ursa May 05 '20 at 13:50
  • You are kind of right - my big fear when copy-pasting code out of a huge library, is that single methods are never standalone: they take advantage of other facilities in that library (and dependencies), so it is always a game of catch-them-all. I reviewed `URIUtils.resolve()` and luckily it is pretty simple - its only non JDK dependency is `Args` from Apache's httpcore, and that can be removed and/or replaced with `Objects.requireNonNull()`. That being said, after reviewing Apache's workaround, I think its a bit naive and while I can't point to a specific fault there, I like mine better :) – Guss May 05 '20 at 21:34