2

I'm working on a project where a local file is exported via HTTP. This involves getting a file URI, relativizing it using the exported path, tacking it onto the export URI and then handling that as a URL on the receiving end.

Normally this works fine, but I run into trouble when the filename contains a semicolon. I narrowed it down to here:

new File(path).toURI()

The above method correctly encodes spaces and the like, but not semicolons (which should be encoded into a %3B).

Ultimately the above method returns the result of the URI constructor (protocol, host, path, fragment), which returns the bad URI.

I could manually replace all semicolons with %3B, but that doesn't feel like the best solution. Is there really no built-in API to correctly encode a path?

Many thanks for any assistance.

Aubin
  • 14,617
  • 9
  • 61
  • 84
Alex Broadwin
  • 1,288
  • 11
  • 23

2 Answers2

2

Semicolon is a perfectly valid char in URIs. Of course if the receiving end uses semicolon as a special delimiter, the sender needs to escape it. But that's outside the standard practice, so you'll have to escape it yourself.

But in the java world, servlet is the standard, and it uses semicolon as special delimiters. I'm not aware of any utility to help you there, so you'll still need to manually escape semicolons.

ZhongYu
  • 19,446
  • 5
  • 33
  • 61
  • 2
    You say this is non-standard, but I disagree. If you have a web server, go ahead and serve a file named "foo;.html" and try to access it. Your web server will export the file as foo%3B.html, and accessing it without the encoding will result in a 404. It is valid, but not as part of the path. – Alex Broadwin Apr 16 '13 at 19:22
  • that's true for servlet servers. – ZhongYu Apr 16 '13 at 19:27
  • 1
    @AlexBroadwin - Apache 2.2 has no problem serving up files with semi-colons in their names. I tested this using curl with the `-v` option and the HTTP request was `GET /~dave/foo;bar.html` and it returned the file without so much as a warning. – D.Shawley Apr 16 '13 at 19:53
  • Interesting, you're right. The listing created by apache encodes the link with %3B, but it serves it properly with a semicolon. Perhaps it's just H2 (the actual server being used) that has an issue with it... – Alex Broadwin Apr 16 '13 at 20:01
  • Confirmed. Apache serves it, H2 does not. – Alex Broadwin Apr 16 '13 at 20:35
0

The reason semicolon is not escaped automatically is because it has a meaning in the URI specification - it delimits "path parameters". The following URI si valid: /some;a=b/path

and represents path /some/path with a path parameter a of value b.

So in this case the escape must be manual, because URI cannot determine whether the semicolon delimits parameters or is part of the path.

Tomas Langer
  • 451
  • 3
  • 5