4

Some queries encoded as UTF-8 that I send to a server are not returning the expect results.

i.e

http://direct.jthinkws.com?type=release&query=artist%3A%28Dinosaur%7E0.7+AND+Jr.%29++AND++%28%2Btrack%3A%22Forget+The+Swan%22+%2Btrack%3A%22Just+Like+Heaven%22+%29++AND+tracks%3A%5B2+TO+100%5D++AND+src%3A1&limit=20&offset=0

is only returning two results (results are returned as Xml) in my application and only 2 results if I put directly into Firefox browser

However if I put the non-encoded url value into Firefox

http://direct.jthinkws.com?type=release&query=artist:(Dinosaur~0.7 AND Jr.) AND (+track:"Forget The Swan" +track:"Just Like Heaven" ) AND tracks:[2 TO 100] AND src:1&limit=20&offset=0

it returns 44 files

and from my server I can see I get the following request which I assume must be firefox encoding the data

http://direct.jthinkws.com?type=release&query=artist:(Dinosaur~0.7%20AND%20Jr.)%20%20AND%20%20(+track:"Forget%20The%20Swan"%20+track:"Just%20Like%20Heaven"%20)%20%20AND%20tracks:[2%20TO%20100]%20%20AND%20src:1&limit=20&offset=0

as you can see it is encoding it slightly differently - spaces are being converted to '%20' not '+' and '(' and ')' are not converted.

I dont understand the difference and why one works and one doesn't, also why the one that doesnt work does return some results just not as many.

(Also I tried encoding as ISO-8859-1 instead of UTf-8) and that completely failed the server couldnt decode it so Im sure UTf8 is the correct encoding.

My code is written in Java and its encodes the value of the query using UREncoder, i.e

String query = URLEncoder.encode(queryValue.toString(), StandardCharsets.UTF_8.name());
Abhinav Singh Maurya
  • 3,313
  • 8
  • 33
  • 51
Paul Taylor
  • 13,411
  • 42
  • 184
  • 351
  • 1
    This may help you: http://stackoverflow.com/a/10786112/982149 – Fildor Nov 02 '15 at 11:45
  • Ah, thankyou it does indeed - Ive changed to Google guava UrlEscapers.urlFragmentEscaper().encode(queryValue.toString() and that works – Paul Taylor Nov 02 '15 at 12:16
  • Hmm I think the original encoded query that was returning just two results was indeed the correct answer, its seems to be Firefox is incorrectly encoding namely by not encoding the + (in +track:"Forget The Swan") and therefore it is being interpreted on server as a space (because space can be converted to + by UrlEncoder) then because being intepreted as space it is not restriction search to track1 AND track 2 – Paul Taylor Nov 02 '15 at 16:41
  • The problem was that Firefox was unable to correctly encode the +'s assuming them to be already encoded spaces instead – Paul Taylor Nov 02 '15 at 17:43

1 Answers1

0

I had this the wrong way round my code was actually working fine it should only return two results.

The problem was that Firefox was unable to correctly encode the +'s assuming them to be already encoded spaces. So you cant always rely on Firefox to correctly encode a unencoded url that you paste into it.

This problem is not specific to Firefox, it maybe that no browser could manage to encode this properly. So if you are using a url that includes '+'s be careful.

Paul Taylor
  • 13,411
  • 42
  • 184
  • 351