53

I got this error message :

java.net.URISyntaxException: Illegal character in query at index 31: http://finance.yahoo.com/q/h?s=^IXIC

My_Url = http://finance.yahoo.com/q/h?s=^IXIC

When I copied it into a browser address field, it showed the correct page, it's a valid URL, but I can't parse it with this: new URI(My_Url)

I tried : My_Url=My_Url.replace("^","\\^"), but

  1. It won't be the url I need
  2. It doesn't work either

How to handle this ?

Frank

Eric Tobias
  • 3,225
  • 4
  • 32
  • 50
Frank
  • 30,590
  • 58
  • 161
  • 244

10 Answers10

61

You need to encode the URI to replace illegal characters with legal encoded characters. If you first make a URL (so you don't have to do the parsing yourself) and then make a URI using the five-argument constructor, then the constructor will do the encoding for you.

import java.net.*;

public class Test {
  public static void main(String[] args) {
    String myURL = "http://finance.yahoo.com/q/h?s=^IXIC";
    try {
      URL url = new URL(myURL);
      String nullFragment = null;
      URI uri = new URI(url.getProtocol(), url.getHost(), url.getPath(), url.getQuery(), nullFragment);
      System.out.println("URI " + uri.toString() + " is OK");
    } catch (MalformedURLException e) {
      System.out.println("URL " + myURL + " is a malformed URL");
    } catch (URISyntaxException e) {
      System.out.println("URI " + myURL + " is a malformed URL");
    }
  }
}
mooreds
  • 4,932
  • 2
  • 32
  • 40
Eddie
  • 53,828
  • 22
  • 125
  • 145
  • Just make sure that you use URLDecoder if you later want to turn that URI back into a URL, e.g. `new FileInputStream(URLDecoder.decode(uri.toURL().getFile(), "UTF-8"))` – MrDrews Jan 24 '12 at 19:12
  • Tried multiple other suggested solutions, this seems to do the trick. You can use the "uri" directly in your "new HttpGet(uri)". – Andreas Rudolph Sep 18 '12 at 10:59
  • HELP ME ALOT @@@@@@@@@@@ >>>>>>>>>>>>>>> – Samir Mangroliya Feb 13 '15 at 14:45
  • why nullFragment has to be a variable? we can just null in the constructor of the URI itself right? is this something related with design? @edd – Vamsi Pavan Mahesh Aug 26 '15 at 07:12
  • mailto URLs need some special handling – Wolfgang Fahl Jun 28 '17 at 15:21
  • 2
    To keep stuff after any # anchors or non-default ports do: URI uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(), url.getPort(), url.getPath(), url.getQuery(), url.getRef()); – Manuel Nov 07 '17 at 15:31
  • This, unfortunately, works well and is completely ridiculous that it is needed - this yet another Java URI/URL stupidity such as Java bug https://bugs.java.com/bugdatabase/view_bug.do?bug_id=JDK-8218962 – Guss Jul 08 '20 at 13:20
24

Use % encoding for the ^ character, viz. http://finance.yahoo.com/q/h?s=%5EIXIC

1ac0
  • 2,875
  • 3
  • 33
  • 47
araqnid
  • 127,052
  • 24
  • 157
  • 134
15

You have to encode your parameters.

Something like this will do:

import java.net.*;
import java.io.*;

public class EncodeParameter { 

    public static void main( String [] args ) throws URISyntaxException ,
                                         UnsupportedEncodingException   { 

        String myQuery = "^IXIC";

        URI uri = new URI( String.format( 
                           "http://finance.yahoo.com/q/h?s=%s", 
                           URLEncoder.encode( myQuery , "UTF8" ) ) );

        System.out.println( uri );

    }
}

http://java.sun.com/javase/6/docs/api/java/net/URLEncoder.html

OscarRyz
  • 196,001
  • 113
  • 385
  • 569
  • URLEncoder (https://docs.oracle.com/javase/7/docs/api/java/net/URLEncoder.html) is not a URI segment encoder. That is done by the URI class itself. URLEncoder is an encoder for HTML FORMS! it is a very close match, but not 100% – Gabriel Feb 24 '20 at 19:29
4

Rather than encoding the URL beforehand you can do the following

String link = "http://example.com";
URL url = null;
URI uri = null;

try {
   url = new URL(link);
} catch(MalformedURLException e) {
   e.printStackTrace();
}

try{
   uri = new URI(url.toString())
} catch(URISyntaxException e {
   try {
        uri = new URI(url.getProtocol(), url.getUserInfo(), url.getHost(),
                      url.getPort(), url.getPath(), url.getQuery(), 
                      url.getRef());
   } catch(URISyntaxException e1 {
        e1.printStackTrace();
   }
}
try {
   url = uri.toURL()
} catch(MalfomedURLException e) {
   e.printStackTrace();
}

String encodedLink = url.toString();
Guss
  • 30,470
  • 17
  • 104
  • 128
Wil Hester
  • 41
  • 1
2

A general solution requires parsing the URL into a RFC 2396 compliant URI (note that this is an old version of the URI standard, which java.net.URI uses).

I have written a Java URL parsing library that makes this possible: galimatias. With this library, you can achieve your desired behaviour with this code:

String urlString = //...
URLParsingSettings settings = URLParsingSettings.create()
  .withStandard(URLParsingSettings.Standard.RFC_2396);
URL url = URL.parse(settings, urlString);

Note that galimatias is in a very early stage and some features are experimental, but it is already quite solid for this use case.

smola
  • 863
  • 8
  • 15
2

A space is encoded to %20 in URLs, and to + in forms submitted data (content type application/x-www-form-urlencoded). You need the former.

Using Guava:

dependencies {
     compile 'com.google.guava:guava:28.1-jre'
}

You can use UrlEscapers:

String encodedString = UrlEscapers.urlFragmentEscaper().escape(inputString);

Don't use String.replace, this would only encode the space. Use a library instead.

Jebil
  • 1,144
  • 13
  • 25
1

Coudn't imagine nothing better for
http://server.ru:8080/template/get?type=mail&format=html&key=ecm_task_assignment&label=Согласовать с контрагентом&descr=Описание&objectid=2231
that:

public static boolean checkForExternal(String str) {
    int length = str.length();
    for (int i = 0; i < length; i++) {
        if (str.charAt(i) > 0x7F) {
            return true;
        }
    }
    return false;
}

private static final Pattern COLON = Pattern.compile("%3A", Pattern.LITERAL);
private static final Pattern SLASH = Pattern.compile("%2F", Pattern.LITERAL);
private static final Pattern QUEST_MARK = Pattern.compile("%3F", Pattern.LITERAL);
private static final Pattern EQUAL = Pattern.compile("%3D", Pattern.LITERAL);
private static final Pattern AMP = Pattern.compile("%26", Pattern.LITERAL);

public static String encodeUrl(String url) {
    if (checkForExternal(url)) {
        try {
            String value = URLEncoder.encode(url, "UTF-8");
            value = COLON.matcher(value).replaceAll(":");
            value = SLASH.matcher(value).replaceAll("/");
            value = QUEST_MARK.matcher(value).replaceAll("?");
            value = EQUAL.matcher(value).replaceAll("=");
            return AMP.matcher(value).replaceAll("&");
        } catch (UnsupportedEncodingException e) {
            throw LOGGER.getIllegalStateException(e);
        }
    } else {
        return url;
    }
}
Grigory Kislin
  • 16,647
  • 10
  • 125
  • 197
1

I had this exception in the case of a test for checking some actual accessed URLs by users.

And the URLs are sometime contains an illegal-character and hang by this error.

So I make a function to encode only the characters in the URL string like this.

String encodeIllegalChar(String uriStr,String enc)
  throws URISyntaxException,UnsupportedEncodingException {
  String _uriStr = uriStr;
  int retryCount = 17;
  while(true){
     try{
       new URI(_uriStr);
       break;
     }catch(URISyntaxException e){
       String reason = e.getReason();
       if(reason == null ||
         !(
          reason.contains("in path") ||
          reason.contains("in query") ||
          reason.contains("in fragment")
         )
       ){
         throw e;
       }
       if(0 > retryCount--){
         throw e;
       }
       String input = e.getInput();
       int idx = e.getIndex();
       String illChar = String.valueOf(input.charAt(idx));
       _uriStr = input.replace(illChar,URLEncoder.encode(illChar,enc));
     }
  }
  return _uriStr;
}

test:

String q =  "\\'|&`^\"<>)(}{][";
String url = "http://test.com/?q=" + q + "#" + q;
String eic = encodeIllegalChar(url,'UTF-8');
System.out.println(String.format("  original:%s",url));
System.out.println(String.format("   encoded:%s",eic));
System.out.println(String.format("   uri-obj:%s",new URI(eic)));
System.out.println(String.format("re-decoded:%s",URLDecoder.decode(eic)));
0

If you're using RestangularV2 to post to a spring controller in java you can get this exception if you use RestangularV2.one() instead of RestangularV2.all()

gary69
  • 3,620
  • 6
  • 36
  • 50
0

Replace spaces in URL with + like If url contains dimension1=Incontinence Liners then replace it with dimension1=Incontinence+Liners.

kkashyap1707
  • 494
  • 2
  • 8
  • 16