Java Reading Undecoded URL from Servlet

Question

Let's presume that I have string like '=&?/;#+%' to be a part of my URL, let's say like this:

example.com/servletPath/someOtherPath/myString/something.html?a=b&c=d#asdf

where myString is the above string. I've encoded critical part so URL looks like

example.com/servletPath/someOtherPath/%3D%26%3F%2F%3B%23%2B%25/something.html?a=b&c=d#asdf

So far so good.

When I'm in the servlet and I read any of request.getRequestURI(), request.getRequestURL() or request.getPathInfo(), returned value is already decoded, so I get strilng like

someOtherPath/=&?/;#+%/something.html?a=b&c=d#asdf

and I can't differentiate between real special characters and encoded ones.

I've solved particular problem by banning above chars altogether, which works in this situation, but I still wonder is there any way to get undecoded URL in servlet class.

YET ANOTHER EDIT: When I've hit this problem last evening I was too tired to notice what is really going on, which is even more bizarre! I have servlet mapped on, say /servletPath/* after that I can put whatever I want and get my servlet responding depending on the rest of a path, except when there is %2F in the path. In that case request never hits the servlet, and I get 404! If i put '/' instead of %2F it works OK. I'm running Tomcat 6.0.14 on Java 1.6.0-04 on Linux.

if the string is already decoded, why would it have a %2f in it? — Mike Pone, Jun 08 '09 at 17:55
What does the returned value look like and what do you want it to be? And is it relevant? I can't really tell what the problem is. — Michael Myers, Jun 08 '09 at 17:58
Sounds like the case of trying to decode an illegal and malformed URL. Running outside of the spec like this is likely to cause a bunch of problems. Can you have control to change the way the data is passed? e.g. move to post data? — Cheekysoft, Jun 09 '09 at 11:02
For anyone stumbling upon this at a future date, the problem with %2F is due to a [CGI security precaution](http://superuser.com/questions/373797/apache-returning-404-if-pathinfo-includes-partially-uri-encoded-url). — jkitchen, Jun 25 '14 at 14:19

jcsahnwaldt Reinstate Monica · Accepted Answer · 2015-08-11T23:55:11.223

There is a fundamental difference between '%2F' and '/', both for the browser and the server.

The HttpServletRequest specification says (without any logic, AFAICT):

getContextPath: not decoded
getPathInfo: decoded
getPathTranslated: not decoded
getQueryString: not decoded
getRequestURI: not decoded
getServletPath: decoded

The result of getPathInfo() should be decoded, but the result of getRequestURI() must not be decoded. If it is, your Servlet container is breaking the spec (as Wouter Coekaerts and Francois Gravel correctly pointed out). Which Tomcat version are you running?

Making matters even more confusing, current Tomcat versions reject paths that contain encodings of certain special characters, for security reasons.

Powerlord · Answer 2 · 2009-06-08T18:15:57.903

2

If there's a %2F in the decoded url, it means the encoded url contained %252F.

Since %2F is / Why not just split on "\/" and not worry about URL encoding?

edited Jun 08 '09 at 18:15

answered Jun 08 '09 at 17:58

Powerlord

87,612
17
125
175

Francois Gravel · Answer 3 · 2009-06-09T12:33:08.880

1

According to the Javadoc, getRequestURI should not decode the string. On the other hand, getServletPath return a decoded string. I tested this locally using Jetty and it behaves as described in the doc.

So there might be something else at play in your situation since the behavior you're describing doesn't match the Sun documentation.

edited Jun 09 '09 at 12:33

answered Jun 09 '09 at 11:19

Francois Gravel

461
3
3

You are partially right. When I have some UTF-8 character it stays undecoded, but spetial characters arent. I'm working on Tomcat. – Slartibartfast Jun 09 '09 at 11:48

score 0 · Answer 4 · answered Jun 08 '09 at 20:51

0

It seems like you are trying to do something RESTy (use Jersey). Can's you just parse off the leading and trailing parts of the URL to get the data you are looking for?

url.substring(startLength, url.length - endLength);

answered Jun 08 '09 at 20:51

stevedbrown

8,862
8
43
58

nope, I've got param1/param2/param3 and they are all of unknown length. – Slartibartfast Jun 08 '09 at 21:12

Wouter Coekaerts · Answer 5 · 2012-05-09T11:16:18.540

Update: this answer was originally wrongly stating that '/' and '%2F' in a path should always be treated the same. They are in fact different because a path is a list of /-separated segments.

You should not have to make a difference between an encoded and not encoded character in the path part of the URL. There is no character inside the path that can have a special meaning in a URL. E.g. '%2F' must be interpreted the same as '/', and a browser accessing such a URL is free to replace one by the other as it sees fit. Making a difference between them is breaking the standard of how URLs are encoded.

In the complete URL, you must make a difference between escaped and non-escape characters for different reasons, including:

To see where the path part ends. Because a ? encoded in the path should not be seen as the end.
Inside the query String. Because part of the value of a parameter could contain '&' or '=',...
Inside a path, a '/' separates two segments while '%2F' can be contained within a segment

Java deals fine with the first two cases:

getPathInfo() which returns only the path part, decoded
getParameter(String) to access parts of the query part

It doesn't deal so well with the third case. If you want to make a difference between '/' as the separation of two path segments, and a '/' inside a path segment (%2F), then you cannot consistently represent the path as one decoded string. You can either represent it as one encoded string (eg "foo/bar%2Fbaz"), or as a list of decoded segments (eg "foo", "bar/baz"). But because getPathInfo() API promises to do just that (one decoded string), it has no choice but to treat '/' and '%2F' as the same.

For usual web applications, this is just fine. If you are in the rare case where you really need to make the difference, you can do your own parsing of the URL, getting the raw version with getRequestURI(). If that one gives the URL decoded as you claim, then that means there is a bug in the servlet implementation you're using.

So it was my bad that I've thought that there is difference between / and %2F, while by standard there isn't. As I've said, I've skip the problem by eliminating characters before they hit url encoding part, which is I guess only standard compilant way. — Slartibartfast, Jun 09 '09 at 12:16
Actually, I believe there is a difference between "/" and "%2F" in the path. [RFC3986](http://www.ietf.org/rfc/rfc3986.txt) indicates that the path is a "/"-separated sequence of "path segments". So if you want a "path segment" containing a slash character it has to be encoded as %2F. This is stated for instance in [the Wikipedia article on percent-encoding](http://en.wikipedia.org/wiki/Percent-encoding). To my understanding, it is fine to have a server which uses this distinction, and a browser which didn't maintain the distinction would be broken. — Robert Tupelo-Schneck, Apr 16 '12 at 19:21
@RobertTupelo-Schneck You're right. I just edited the answer to fix that. — Wouter Coekaerts, May 09 '12 at 11:17

Java Reading Undecoded URL from Servlet

5 Answers5

Linked