3

A certain site (which is not under my control) has an internal search engine that uses GET requests that look like: something.com/search?query=%u0001%0101, which I would like to use in my Java code .

To my understanding this is a not so common way (UTF-16) to do Url encoding. I tried using HttpURLConnection with a Url of the above type, but this throws me a java.net.URISyntaxException Malformed escape pair at index X (X being the position of the %u0001).

What can I do? I'm pretty new to these url encoding issues, so any advice would be highly appreciated.

DannyA
  • 1,571
  • 2
  • 17
  • 28
  • Maybe you must double-encode? First to UTF-16 then URL-encoding? – home Sep 06 '11 at 20:07
  • Not sure what you mean, but perhaps this will clarify: First I encode the unicode chars to ASCII to match the site's syntax (fake e.g. %$# -> %u0000%u0002%u0500), then I create a URL from them, and try to open a connection. So my code is something like: Url("something.com/search?query=%u0000%u0002%u0500").openConnection(); – DannyA Sep 06 '11 at 20:17
  • For my specific case, [the answer here](http://stackoverflow.com/questions/2280863/uri-encoding-in-unicode-for-apache-httpclient-4) solved the problem. Though I have not tried, my searches came to the conclusion that @McDowell has a correct and more general approach. – DannyA Sep 06 '11 at 21:32

2 Answers2

1

The form something.com/search?query=%u0001%0101 violates the URI specification as percentage characters are reserved for percent-encoding. Under this rule, a percentage symbol must be followed by two hexadecimal digits. This is not a valid UTF-16 encoded URI.

It is not surprising that errors are thrown on these addresses.

You may have to resort to opening a socket and sending your own malformed client request.

GET /search?query=%u0001%0101 HTTP/1.1
Host: something.com
Community
  • 1
  • 1
McDowell
  • 107,573
  • 31
  • 204
  • 267
  • Thanks! I guess you are right, and that's what I would have done, had I not found out another way that "accidentally" works in my case... (answered my own question eventually) – DannyA Sep 06 '11 at 21:26
0

You can use java.net.URLEncoder.encode("you string", "UTF-16");

CrackerJack9
  • 3,650
  • 1
  • 27
  • 48
  • Thanks. I think this method encodes to a different format from the one I'm looking for. It creates: %00%01 ASCII, instead of the %u0001 format the site I'm trying to use needs. – DannyA Sep 06 '11 at 20:33
  • @DannyA What string are you encoding and what are you expecting it to look like afterwards? – CrackerJack9 Sep 06 '11 at 20:48
  • The string could be anything I would like to search (e.g. "דני") and the result I need is something in the above format (e.g. "%u05D3%u05E0%u05D9" for this example). But the specific ASCII format of the unicode is less of a problem (I could play with the chars a bit). The problem, to my understanding, is that the URL with this (%uXXXX) encoding is considered malformed by java's libs. – DannyA Sep 06 '11 at 20:58