2

I am working on a Spanish version of search and when ever the user types in the Spanish characters(say HÍBRIDOS) I see some exception(shown below). Showing how i coded below. the url is sent is over the wire is as shown.

url=http://wwwdev.searchbridg.com/absd/JSONControllerServlet.do?&N=0&Ntk=AllText&Ntt=HÃBRIDOS&Nty=1&Ntx=mode+matchall

  DefaultHttpClient httpClient = new DefaultHttpClient();
    HttpParams params = httpClient.getParams();
    try {
        HttpConnectionParams.setConnectionTimeout(params, 10000);
        HttpConnectionParams.setSoTimeout(params, 10000);
    } catch (Exception e) {
        e.printStackTrace();
        throw e;
    }
    HttpHost proxy = new HttpHost(getProxy(), getProxyPort());
    ConnRouteParams.setDefaultProxy(params, proxy);
    URI uri;
    InputStream data = null;
        uri = new URI(url);
        HttpGet method = new HttpGet(uri);
        HttpResponse response=null;
        try {
        response = httpClient.execute(method);
        }catch(Exception e) {
            e.printStackTrace();
            throw e;
        }
        data = response.getEntity().getContent();
    Reader r = new InputStreamReader(data);
    HashMap<String, Object> jsonObj = (HashMap<String, Object>) GenericJSONUtil.fromJson(r);

java.net.URISyntaxException: Illegal character in query at index 101: http://wwwdev.searchbridge.com/abs/JSONControllerServlet.do?&N=0&Ntk=AllText&Ntt=H├?BRIDOS&Nty=1&Ntx=mode+matchall
    at java.net.URI$Parser.fail(URI.java:2816)
    at java.net.URI$Parser.checkChars(URI.java:2989)
    at java.net.URI$Parser.parseHierarchical(URI.java:3079)
    at java.net.URI$Parser.parse(URI.java:3021)
    at java.net.URI.<init>(URI.java:578)

I tried encoding using UTF-8 encoding and still not working shows same exception. The html pages is set to <meta charset="utf-8" />

byte[] bytes = url.getBytes("UTF8");
    String stringuRL = new String(bytes,"UTF-8");
        uri = new URI(stringuRL);
pushya
  • 4,338
  • 10
  • 45
  • 54

2 Answers2

4

If you're sending special chars on request (GET request), you must URLescape them. Look at this thread to find out how. HTTP URL Address Encoding in Java

When you receive the request, you must do the reverse process to get the original word.

Community
  • 1
  • 1
Alfabravo
  • 7,493
  • 6
  • 46
  • 82
  • Usually the app server (Tomcat, JBoss, etc) has configuration about how to decode this. `URIEncoding` is called... – helios Jun 15 '12 at 15:05
  • @helios Indeed. Also, if you need to make your app decoupled from the container, you'll use something non related to the container/web server – Alfabravo Jun 15 '12 at 15:24
  • @Alfabravo if i encode the string in the javascript and send it to java backed do i need to decode back to original character before sending over wire. – pushya Jun 15 '12 at 15:31
  • @pushya if i get it right, you're sending a GET request using JSON to a java servlet, right? If so, you must send a correct request every time you exchange HTTP requests because (XML)HTTP clients are pretty standard and any of them will fail with wrong chars on requests – Alfabravo Jun 15 '12 at 15:40
  • @Alfabravo: what I want to say is 1) the app-server usually decodes this and provides it to the servlet already decoded 2) the app-server decodes this based on some fixed parameter in its configuration, and the client cannot indicate the coding of the URL, so you must decide which one to use (I would use UFT-8!) – helios Jun 15 '12 at 16:11
  • @helios You're right, sir. And the app-server decides according to request headers! A good post outside SO about that: http://jinpeng09.blogspot.com/2009/03/how-tomcat-decode-input-values-in-http.html – Alfabravo Jun 15 '12 at 16:18
  • @Alfabravo Actually i am grabing the search keyword and sending it to SearchDAO layer and in the searchDao layer after getting the keyword i am sending it out. The response i am getting from the Service is JSON we are sending a simple URL/URI request to the service. My Question is when if i encode the keyword in Javascript and send it back to DAO layer do i need to decode to original and again do the encoding in java and send it over to the service – pushya Jun 15 '12 at 17:24
  • @Alfabro: I didn't new that Tomcat tried to use the header information to decode the URL. Note that the URL is the first thing to come and headers come next... so I thought it was late for the URL to be reinterpreted when headers where read. – helios Jun 15 '12 at 22:54
  • I stand corrected. Look: --Default encoding for GET-- The character set for HTTP query strings (that's the technical term for 'GET parameters') can be found in sections 2 and 2.1 the "URI Syntax" specification. The character set is defined to be US-ASCII. Any character that does not map to US-ASCII must be encoded in some way... http://wiki.apache.org/tomcat/FAQ/CharacterEncoding – Alfabravo Jun 15 '12 at 23:23
  • @Alfabravo I encoded the HÍBRIDOS in java script H%CDBRIDOS is sent out and i broke down the url and used the multiple argument constructors and this did not throw any exception as i was seeing before. But i see In the response from the vendor the searched term is sent as H%CDBRIDOS and in the UI i need to show it as HÍBRIDOS. How should i do this – pushya Jun 18 '12 at 17:57
1

All parameters in a get request needs to have its value encoded.

If you are using HTTPClient 4 you can do this more or less like this:

List<NameValuePair> parameters = new ArrayList<NameValuePair>();
parameters.add(new BasicNameValuePair("parameter_name_Ã", "another value with ~ãé"));
parameters.add(new BasicNameValuePair("second_parameter", "still other ú û"));
String url = "http://foo.bar/?" + URLEncodedUtils.format(parameters, "UTF-8");

The result on this case will be http://foo.bar/?parameter_name_%C3%83=another+value+with+%7E%C3%A3%C3%A9&second_parameter=still+other+%C3%BA+%C3%BB

Francisco Spaeth
  • 23,493
  • 7
  • 67
  • 106
  • Usually the app server (Tomcat, JBoss, etc) has configuration about how to decode this. `URIEncoding` is called... Please note that the server has to use the same char encoding (UTF-8 in the example). – helios Jun 15 '12 at 15:05