I haven't found so far how to encode this string to match both storing in an HTML and encoded as a URL
That's because there isn't any, since those are two separate things.
Printing in HTML should generally be done by replacing only '
, "
, <
, >
and &
with '
, "
, <
, >
and &
. Here are examples doing that: Recommended method for escaping HTML in Java, the most trivial and easiest to reason with being
public static String encodeToHTML(String str) {
return str
.replace("'", "'")
.replace("\"", """)
.replace("<", "<")
.replace(">", ">")
.replace("&", "&");
}
Note that you need to have matching character set in your page, and be aware that if you for example print the url in an attribute field, requirements are a bit different.
Encoding as an url allows for a lot shorter list of characters. From URLEncoder documentation:
The alphanumeric characters "a" through "z", "A" through "Z" and "0"
through "9" remain the same.
The special characters ".", "-", "*", and "_" remain the same.
The space character " " is converted into a plus sign "+".
All other characters are unsafe and are first converted into
one or more bytes using some encoding scheme. Then each byte is
represented by the 3-character string "%xy", where xy is the two-digit
hexadecimal representation of the byte.
The recommended encoding scheme to use is UTF-8.
You'd get those with
String encoded = new java.net.URLEncoder.encode(url, "UTF-8");
The above will give you HTML form encoding, which is close to what url encoding does, with a few noteable differences, the most relevant being +
vs %20
. For that, you can do this on its output:
String encoded = encoded.replace("+", "%20");
Note also that you don't want to use url encoding for the whole http://BUCKET_ENDPOINT/PATH_1/PATH_2/PATH_3/PATH_4/PATH_5/TEST NAME COULD BE WITH & AND OTHER SPECIAL CHARS.zip
, but to the last part of it, TEST NAME COULD BE WITH & AND OTHER SPECIAL CHARS.zip
, and the individual path segments if they are not fixed.
If you are in a position that you need to generate the url and print it in html, first encode it as an url, then do html escaping.