114

I came across the following URL today:

http://www.sfgate.com/cgi-bin/blogs/inmarin/detail??blogid=122&entry_id=64497

Notice the doubled question mark at the beginning of the query string:

??blogid=122&entry_id=64497

My browser didn't seem to have any trouble with it, and running a quick bookmarklet:

javascript:alert(document.location.search);

just gave me the query string shown above.

Is this a valid URL? The reason I'm being so pedantic (assuming that I am) is because I need to parse URLs like this for query parameters, and supporting doubled question marks would require some changes to my code. Obviously if they're in the wild, I'll need to support them; I'm mainly curious if it's my fault for not adhering to URL standards exactly, or if it's in fact a non-standard URL.

Justin Johnson
  • 30,978
  • 7
  • 65
  • 89
Bungle
  • 19,392
  • 24
  • 79
  • 106
  • Fortunately, in spite of this I didn't need to change my code. I was using `indexOf()` to locate the question mark, so it picked up the position of the first occurrence. Then I'm splitting the query_parameters out at each `&` and then their name/value pairs at each `=`. – Bungle May 27 '10 at 20:03

2 Answers2

132

Yes, it is valid. Only the first ? in a URL has significance, any after it are treated as literal question marks:

The query component is indicated by the first question mark ("?") character and terminated by a number sign ("#") character or by the end of the URI.

...

The characters slash ("/") and question mark ("?") may represent data within the query component. Beware that some older, erroneous implementations may not handle such data correctly when it is used as the base URI for relative references (Section 5.1), apparently because they fail to distinguish query data from path data when looking for hierarchical separators. However, as query components are often used to carry identifying information in the form of "key=value" pairs and one frequently used value is a reference to another URI, it is sometimes better for usability to avoid percent-encoding those characters.

https://www.rfc-editor.org/rfc/rfc3986#section-3.4

Community
  • 1
  • 1
Amber
  • 507,862
  • 82
  • 626
  • 550
  • 11
    So does that mean that the first query parameter is named "?blogid" and not "blogid"? That could be fun... – GalacticCowboy May 27 '10 at 19:32
  • 3
    @GalacticCowboy - Yeah, the same thing just occurred to me. You are correct - Firebug confirms that the first query parameter is in fact `?blogid`. It actually appears to be a non-essential parameter, i.e. the page is served the same with any number of question marks there, or omitting the parameter entirely. – Bungle May 27 '10 at 19:39
33

As a tangentially related answer, foo?spam=1?&eggs=3 gives the parameter spam the value 1?

Hilton Shumway
  • 582
  • 5
  • 8
  • yes. in case there is no `.htaccess` or similar tricks. if we change `foo` to `script.php` and make this request `script.php?spam=1?&eggs=3` then `var_dump($_GET)` shows `array(2) { ["spam"]=> string(2) "1?" ["eggs"]=> string(1) "3" }` – Hebe Jul 02 '20 at 15:20