I need to generate a href
to a URI. All easy with the exception when it comes to reserved characters which need percent-encoding, e.g. link to /some/path;element
should appear as <a href="/some/path%3Belement">
(I know that path;element
represents a single entity).
Initially I was looking for a Java library that does this but I ended up writing something myself (look below for what failed with Java, as this question isn't Java-specific).
So, RFC 3986 does suggest when NOT to encode. This should happen, as I read it, when character falls under unreserved (ALPHA / DIGIT / "-" / "." / "_" / "~")
class. So far so good. But what about the opposite case? RFC only mentions that percent (%
) always needs encoding. But what about the others?
Question: is it correct to assume that everything that is not unreserved, can/should be percent-encoded? For example, opening bracket (
does not necessarily need encoding but semicolon ;
does. If I don't encode it I end up looking for /first
* when following <a href="/first;second">
. But following <a href="/first(second">
I always end up looking for /first(second
, as expected. What confuses me is that both (
and ;
are in the same sub-delims
class as far as RFC goes. As I imagine, encoding everything non-unreserved is a safe bet, but what about SEOability, user friendliness when it comes to localized URIs?
Now, what failed with Java libs. I have tried doing it like
new java.net.URI("http", "site", "/pa;th", null).toASCIISTring()
but this gives http://site/pa;th
which is no good. Similar results observed with:
javax.ws.rs.core.UriBuilder
- Spring's UriUtils - I have tried both
encodePath(String, String)
andencodePathSegment(String, String)
[*] /first
is a result of call to HttpServletRequest.getServletPath()
in the server side when clicking on <a href="/first;second">
EDIT: I probably need to mention that this behaviour was observed under Tomcat, and I have checked both Tomcat 6 and 7 behave the same way.