My product is a web application.
I have files that I upload and download later on, to/from my server.
I am using java.net.URLDecoder.decode() when uploading files with unicode characters and java.net.URLDecoder.encode() when downloading files in order to save the file name and finally return it to the client as expected with no question marks and stuff (?????) .
The problem is that if the file name consists spaces then the encode/decode replace them with + character which is perfectly normal because that's their business implementation, but clearly as you can understand it does not fit to my purpose.
The question is what alternative do I have to overcome this situation?
Is there build-in method for that or 3rd party package?

- 339
- 1
- 5
- 20
-
And where do you insert that filename? Is that in a URI query string, fragment part, other? – fge Apr 08 '14 at 09:39
-
Is `+` not decoded as space`? One could replace the plus after encoding with `%20`. – Joop Eggen Apr 08 '14 at 09:39
-
@JoopEggen that won't work all the time; consider for instance that `+` is legal in a URI fragment – fge Apr 08 '14 at 09:40
-
I am inserting the file name to the response header of the http response. – Snow Apr 08 '14 at 09:47
-
In a header? Uhwell, just surround the filename with double quotes in all situations, then! You don't even need to encode them – fge Apr 08 '14 at 09:48
-
Double quotes? Can you post an example here please? – Snow Apr 08 '14 at 09:51
-
Well, `"my filename with spaces.txt"` – fge Apr 08 '14 at 09:51
-
Ok, I understand what you wrote here but my problem is Unicode characters. I cannot just put the string as is to header, because then I will see the file as follow: "????.txt" or "----.txt". That is why I used the encode/decode functions from the first place, but they are not handling with spaces. – Snow Apr 08 '14 at 09:55
-
If the other end cannot see the characters properly, it means you have another problem to begin with, and that is correct header character coding! If headers are UTF-8 this is not a problem. Anyway -- go with Guava, you can't go wrong. – fge Apr 08 '14 at 10:06
-
See this posts: http://stackoverflow.com/questions/11213160/sending-utf-8-values-in-http-headers-results-in-mojibake http://stackoverflow.com/questions/1432233/asp-net-download-file-with-japanese-file-name – Snow Apr 08 '14 at 10:55
-
You can see that they are using the same encode/decode to solve the same problem but they will have the same problem as I have when spaces are involve in the file name – Snow Apr 08 '14 at 10:57
-
So, basically, you need to write your own Unicode escaper -- easy with Guava ;) – fge Apr 08 '14 at 11:05
5 Answers
You don't tell where this filename is used. The characters to encode will be different whether, for instance, it is in a URI query string or fragment part.
You probably want to have a look at Guava's (15.0+) Escaper
s; and, in particular here, UnicodeEscaper
implementations and its derived class PercentEscaper
. Guava already provides a few of them usable in various parts of URLs.
EDIT: here is how to do with Guava:
public final class FilenameEscaper
extends PercentEscaper
{
public PercentEscaper()
{
super("", false);
}
}
Done! See here. Of course, you may want to declare that some more characters than the default ones are safe.
Also have a look at RFC 5987 to make a better encoder.
-
The thing is that it is not a URL neither URI it is just a file name with Unicode characters that I set in the response header when the user download the file, so he will see the file name correctly. – Snow Apr 08 '14 at 09:49
-
See my comment above then. You don't need any sort of encoding, just to surround the file name with double quotes – fge Apr 08 '14 at 09:51
You could also convert a space to %20.
See: URL encoding the space character: + or %20?
There are also various other Java libraries that do URL encoding, with %20. Here are a two examples:
Guava:
UrlEscapers.urlPathSegmentEscaper().escape(urlToEscape);
Spring Framework:
UriUtils.encodePath(urlToEscape, Charsets.UTF_8.toString());

- 1
- 1

- 2,255
- 1
- 21
- 31
-
-
You could try to encode header text like suggested in RFC 2047. Not sure if support is good for this though. javax.mail.internet.MimeUtility is able to do this conversion. – Gregor Koukkoullis Apr 08 '14 at 10:12
-
See also: http://stackoverflow.com/questions/324470/http-headers-encoding-decoding-in-java – Gregor Koukkoullis Apr 08 '14 at 10:13
This worked for me:
URLEncoder.encode(someString, "UTF-8").replace("+", "%20");

- 825
- 1
- 9
- 22
-
2
-
I think that `+` will be encoded into `%2B` before `replace` method. – Elvedin Hamzagic Apr 08 '14 at 11:27
I found the cure!
I was just needed to use java.net.URI for that:
public static String encode(String urlString) throws UnsupportedEncodingException
{
try
{
URI uri = new URI(urlString);
return uri.toASCIIString();
}
catch (URISyntaxException e)
{
e.printStackTrace();
}
}
The toASCIIString() escapes the special characters so when the string arrives to the browser it is shown correctly.

- 339
- 1
- 5
- 20
Had the same problem with spaces. Combination of URL and URI solved it:
URL url = new URL("file:/E:/Program Files/IBM/SDP/runtimes/base");
URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef());
* Please note that URLEncoder is used for web forms application/x-www-form-urlencoded
mime-type - not http network addresses.

- 1
- 1

- 34,335
- 35
- 194
- 277