396

In Java, I want to convert this:

https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest_type

To this:

https://mywebsite/docs/english/site/mybook.do&request_type

This is what I have so far:

class StringUTF 
{
    public static void main(String[] args) 
    {
        try{
            String url = 
               "https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do" +
               "%3Frequest_type%3D%26type%3Dprivate";

            System.out.println(url+"Hello World!------->" +
                new String(url.getBytes("UTF-8"),"ASCII"));
        }
        catch(Exception E){
        }
    }
}

But it doesn't work right. What are these %3A and %2F formats called and how do I convert them?

Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
crackerplace
  • 5,305
  • 8
  • 34
  • 42
  • @Stephen .. Why can't a url be UTF-8 encoded String .. ? – crackerplace May 26 '11 at 12:14
  • The problem is that just because the URL can be UTF-8, the question really has _nothing_ to do with UTF-8. I've edited the question suitably. – C. K. Young May 26 '11 at 12:19
  • It could be (in theory) but the string in your example is not a UTF-8 encoded String. It is a URL-encoded ASCII string. Hence the title is misleading. – Stephen C May 26 '11 at 12:20
  • It is also worth noting that all the characters in the `url` string are ASCII, and this is also true after the string has been URL decoded. `'%'` is an ASCII char and `%xx` represents an ASCII char if `xx` is less than (hexadecimal) `80`. – Stephen C May 26 '11 at 12:34

11 Answers11

749

This does not have anything to do with character encodings such as UTF-8 or ASCII. The string you have there is URL encoded. This kind of encoding is something entirely different than character encoding.

Try something like this:

try {
    String result = java.net.URLDecoder.decode(url, StandardCharsets.UTF_8.name());
} catch (UnsupportedEncodingException e) {
    // not going to happen - value came from JDK's own StandardCharsets
}

Java 10 added direct support for Charset to the API, meaning there's no need to catch UnsupportedEncodingException:

String result = java.net.URLDecoder.decode(url, StandardCharsets.UTF_8);

Note that a character encoding (such as UTF-8 or ASCII) is what determines the mapping of characters to raw bytes. For a good intro to character encodings, see this article.

kryger
  • 12,906
  • 8
  • 44
  • 65
Jesper
  • 202,709
  • 46
  • 318
  • 350
  • 1
    The methods on `URLDecoder` are static so you don't have to create a new instance of it. – laz May 26 '11 at 12:37
  • @whataheck URL encoding is used because in some places you can't use all kinds of characters in an URL, so that some characters are escaped using a `%xx` code as Stephen C explains in a comment on your question above. – Jesper May 26 '11 at 13:13
  • Method you provided is marked as deprecated. Why is that and what is alternative? – Trismegistos Dec 19 '12 at 12:47
  • 2
    @Trismegistos Only the version where you don't specify the character encoding (the second parameter, `"UTF-8"`) is deprecated according to the Java 7 API documentation. Use the version with two parameters. – Jesper Dec 19 '12 at 15:47
  • 27
    If using java 1.7+ you can use the static version of the "UTF-8" string: `StandardCharsets.UTF_8.name()` from this package: `java.nio.charset.StandardCharsets`. Relevant to this: [link](http://stackoverflow.com/questions/6698354/where-to-get-utf-8-string-literal-in-java) – Shahar Apr 30 '14 at 12:46
  • 1
    For character encoding,this makes a great article too balusc.blogspot.in/2009/05/unicode-how-to-get-characters-right.html – crackerplace Jul 16 '14 at 20:32
  • 6
    Be careful with this. As noted here: http://blog.lunatech.com/2009/02/03/what-every-web-developer-must-know-about-url-encoding#Donotuse%7B%7Bjava.net.URLEncoder%7D%7Dor%7B%7Bjava.net.URLDecoder%7D%7DforwholeURLs This is not about URLs, but for HTML form encoding. – Michal May 27 '15 at 12:29
  • Useful for grails and gsp as well ... `` – HumanInDisguise Jun 03 '15 at 09:16
  • this needs to be wrapped in a try/catch block.. read more about checked exceptions (this one) vs unchecked http://stackoverflow.com/questions/6115896/java-checked-vs-unchecked-exception-explanation – dNurb Jul 26 '16 at 20:53
  • Doesn't work if there is a '+' in url. See https://bugs.openjdk.java.net/browse/JDK-8179507 – Evgeny Bovykin Jan 15 '20 at 13:17
74

The string you've got is in application/x-www-form-urlencoded encoding.

Use URLDecoder to convert it to Java String.

URLDecoder.decode( url, "UTF-8" );
Alexander Pogrebnyak
  • 44,836
  • 10
  • 105
  • 121
56

This has been answered before (although this question was first!):

"You should use java.net.URI to do this, as the URLDecoder class does x-www-form-urlencoded decoding which is wrong (despite the name, it's for form data)."

As URL class documentation states:

The recommended way to manage the encoding and decoding of URLs is to use URI, and to convert between these two classes using toURI() and URI.toURL().

The URLEncoder and URLDecoder classes can also be used, but only for HTML form encoding, which is not the same as the encoding scheme defined in RFC2396.

Basically:

String url = "https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest_type";
System.out.println(new java.net.URI(url).getPath());

will give you:

https://mywebsite/docs/english/site/mybook.do?request_type
Ilya Serbis
  • 21,149
  • 6
  • 87
  • 74
Nick Grealy
  • 24,216
  • 9
  • 104
  • 119
  • 6
    In Java 1.7 the `URLDecoder.decode(String, String)` overload is not deprecated. You must be referring to the `URLDecoder.decode(String)` overload without the encoding. You might want to update your post for clarification. – Aaron Aug 18 '14 at 18:31
  • 2
    This answer is misleading; that block quote has nothing to do with the deprecation. The Javadoc of the deprecated method states, and I actually quote `@deprecated The resulting string may vary depending on the platform's default encoding. Instead, use the decode(String,String) method to specify the encoding.` – Emerson Farrugia Apr 01 '15 at 10:30
  • @Klever, not for me. I believe you're using `URL` instead of `URI`, but you haven't provided enough information to reproduce your results. – Nick Grealy Mar 30 '16 at 22:14
  • 2
    getPath() for URIs only returns the path part of the URI, as noted above. – Pelpotronic Jul 25 '16 at 20:33
  • @Pelpotronic - Perhaps you could provide some information so we can replicate your behaviour? I'm using `SUN JDK 1.8.0_73` - and it still works today. – Nick Grealy Jul 26 '16 at 03:46
  • 2
    Unless I'm mistaken, the "path" is known to be that part of a URI after the authority part (see: https://en.wikipedia.org/wiki/Uniform_Resource_Identifier for definition of path) - it seems to me the behaviour I am seeing is the standard/correct behaviour. I'm using java 1.8.0_101 (on Android Studio). I'd be curious to see what you get as "getAuthority()" is called. Even this article/example seems to indicate that path is only the /public/manual/appliances part of their URI:http://www.quepublishing.com/articles/article.aspx?p=26566&seqNum=3 – Pelpotronic Jul 27 '16 at 18:58
  • 2
    @Pelpotronic The code in the post actually does print the output that it shows (at least for me). I think the reason for this is that, because of the URL encoding, the URI constructor is actually treating the entire string, (`https%3A%2F...`), as just the path of a URI; there is no authority, or query, etc. This can be tested by calling the respective get methods on the URI object. If you pass the decoded text to the URI constructor: `new URI("https://mywebsite/do.....")`, then calling `getPath()` and other methods will give correct results. – Kröw Jun 02 '19 at 02:26
  • @Pelpotronic The escaped text isn't a valid URL, and so it has no "query" portion or any of the other properties of a URL. A better way to think about it would be like this: When you want to put a `/` character into a url's path, without it acting as a syntactical character, you have to escape it. Since everything in that string is escaped, it's all a part of the url's path. – Kröw Jun 02 '19 at 02:29
16

%3A and %2F are URL encoded characters. Use this java code to convert them back into : and /

String decoded = java.net.URLDecoder.decode(url, "UTF-8");
Eric Leschinski
  • 146,994
  • 96
  • 417
  • 335
laz
  • 28,320
  • 5
  • 53
  • 50
  • 2
    it not convert %2C too, it's (,) – vuhung3990 Jan 06 '15 at 18:45
  • this needs to be wrapped in a try/catch block.. read more about checked exceptions (this one) vs unchecked http://stackoverflow.com/questions/6115896/java-checked-vs-unchecked-exception-explanation – dNurb Jul 26 '16 at 20:52
6

I use apache commons

String decodedUrl = new URLCodec().decode(url);

The default charset is UTF-8

Sorter
  • 9,704
  • 6
  • 64
  • 74
6
public String decodeString(String URL)
    {

    String urlString="";
    try {
        urlString = URLDecoder.decode(URL,"UTF-8");
        } catch (UnsupportedEncodingException e) {
            // TODO Auto-generated catch block

        }

        return urlString;

    }
Ronak Poriya
  • 2,369
  • 3
  • 18
  • 20
  • 4
    Could you please elaborate more your answer adding a little more description about the solution you provide? – abarisone Jun 16 '15 at 07:22
5
 try {
        String result = URLDecoder.decode(urlString, "UTF-8");
    } catch (UnsupportedEncodingException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
Hsm
  • 1,510
  • 17
  • 16
2
import java.io.UnsupportedEncodingException;
import java.net.URISyntaxException;

public class URLDecoding { 

    String decoded = "";

    public String decodeMethod(String url) throws UnsupportedEncodingException
    {
        decoded = java.net.URLDecoder.decode(url, "UTF-8"); 
        return  decoded;
//"You should use java.net.URI to do this, as the URLDecoder class does x-www-form-urlencoded decoding which is wrong (despite the name, it's for form data)."
    }

    public String getPathMethod(String url) throws URISyntaxException 
    {
        decoded = new java.net.URI(url).getPath();  
        return  decoded; 
    }

    public static void main(String[] args) throws UnsupportedEncodingException, URISyntaxException 
    {
        System.out.println(" Here is your Decoded url with decode method : "+ new URLDecoding().decodeMethod("https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest_type")); 
        System.out.println("Here is your Decoded url with getPath method : "+ new URLDecoding().getPathMethod("https%3A%2F%2Fmywebsite%2Fdocs%2Fenglish%2Fsite%2Fmybook.do%3Frequest")); 

    } 

}

You can select your method wisely :)

rinuthomaz
  • 1,393
  • 2
  • 23
  • 38
2

If it is integer value, we have to catch NumberFormatException also.

try {
        Integer result = Integer.valueOf(URLDecoder.decode(urlNumber, "UTF-8"));
    } catch (NumberFormatException | UnsupportedEncodingException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }
Selva R
  • 31
  • 2
0

Using java.net.URI class:

public String getDecodedURL(String encodedUrl) {
    try {
        URI uri = new URI(encodedUrl);
        return uri.getScheme() + ":" + uri.getSchemeSpecificPart();
    } catch (Exception e) {
        return "";
    }
}

Please note that exception handling can be better, but it's not much relevant for this example.

x7BiT
  • 447
  • 4
  • 5
-1

I was having this problem too and came here as an answer. But I used the code of the friend whose question was approved, it didn't work. I tried something different and it worked, so I'm sharing the following line of code in case it helps.

URLDecoder.decode(URLDecoder.decode(url, StandardCharsets.UTF_8)))