2

(Please no lectures on how this is not the right way to construct URLs etc; this is legacy code and I have bigger fish to fry. I just want to know what the browser thinks it's doing.)

Given the following function:

function redirectToSearch(baseURL) {
    var searchString = document.getElementById("searchBox").value;
    document.location = baseURL + "&searchString=" + searchString;
}

where the searchBox element is a text field: if you put in something like

{ a ^ b } " c | d "

in the text field and call this function, the resulting URL, as redirected-to, ends in

searchString={%20a%20^%20b%20}%20%22c%20|%20d%22

-- spaces and quotation marks are escaped, but nothing else, even though {, }, and | should also be invalid characters. This seems to be true in Chrome, Firefox, and IE.

Okay, presumably I can fix this by encoding the string myself, but what I want to know is, why just spaces and quotes (and possibly other characters I haven't discovered)? Why not either all invalid characters, or none?

Community
  • 1
  • 1
David Moles
  • 48,006
  • 27
  • 136
  • 235
  • 2
    Not an answer to your question and you probably already know, but the function you are looking for is [`encodeURIComponent()`](https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/encodeURIComponent) – Pekka Nov 17 '11 at 22:43

2 Answers2

0

It's because they might be special characters in the URL itself, whereas spaces and the other items you mentioned have no special significance in a URL. Since it knows you're setting a URL (given that location is a URL) then it turns it into a URL the best way it can.

And, of course, you can solve this using encodeURIComponent.

Ry-
  • 218,210
  • 55
  • 464
  • 476
  • 1
    I'm not sure I agree: Neither `{}` nor `|` are [reserved characters](http://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved_characters) – Pekka Nov 17 '11 at 22:45
  • @Pekka: I know - but I suppose the browser (i.e. its programmers) doesn't feel sure it can escape those safely. Maybe you have some special URL requirements. But spaces are not ever allowed in a URL - period. – Ry- Nov 17 '11 at 22:46
  • According to RFC3986, "A URI is composed from a limited set of characters consisting of digits, letters, and a few graphic symbols." That set doesn't include space or quote, but it doesn't include caret or curly brace or pipe either. – David Moles Nov 17 '11 at 23:07
0

Are you getting an error? You shouldn't. Browsers behave differently and do some encoding for you in an attempt to make it easier to read. However, as you already noted, you should be encoding each component (name and value) yourself separately using encodeURIComponent()... Just because the browser tries to fix your mistakes doesn't mean you should be making them...

Ruan Mendes
  • 90,375
  • 31
  • 153
  • 217
  • 1
    This doesn't really answer the question though. Why are some types of characters encoded and others not, although they are not [reserved characters?](http://en.wikipedia.org/wiki/Percent-encoding#Percent-encoding_reserved_characters) – Pekka Nov 17 '11 at 22:46
  • True that it doesn't answer the question. But there's no point to finding out why browsers do it except for fun. If you encode it yourself, you'll never have problems. – Ruan Mendes Nov 17 '11 at 22:49
  • I'm not getting an error in the browser. I am getting an error later when this request ends up on my server and I try to parse the request with more standards-compliant code (java.net.URI class). – David Moles Nov 17 '11 at 23:08
  • Sure, again, the point of my answer is that if you encode it properly, you won't run into this kind of problem. – Ruan Mendes Nov 17 '11 at 23:14