70

Although it is strongly recommended (W3C source, via Wikipedia) for web servers to support semicolon as a separator of URL query items (in addition to ampersand), it does not seem to be generally followed.

For example, compare

        http://www.google.com/search?q=nemo&oe=utf-8

        http://www.google.com/search?q=nemo;oe=utf-8

results. (In the latter case, semicolon is, or was at the time of writing this text, treated as ordinary string character, as if the url was: http://www.google.com/search?q=nemo%3Boe=utf-8)

Although the first URL parsing library i tried, behaves well:

>>> from urlparse import urlparse, query_qs
>>> url = 'http://www.google.com/search?q=nemo;oe=utf-8'
>>> parse_qs(urlparse(url).query)
{'q': ['nemo'], 'oe': ['utf-8']}

What is the current status of accepting semicolon as a separator, and what are potential issues or some interesting notes? (from both server and client point of view)

VxJasonxV
  • 951
  • 11
  • 35
mykhal
  • 19,175
  • 11
  • 72
  • 80

4 Answers4

38

The W3C Recommendation from 1999 is obsolete. The current status, according to the 2014 W3C Recommendation, is that semicolon is now illegal as a parameter separator:

To decode application/x-www-form-urlencoded payloads, the following algorithm should be used. [...] The output of this algorithm is a sorted list of name-value pairs. [...]

  1. Let strings be the result of strictly splitting the string payload on U+0026 AMPERSAND characters (&).

In other words, ?foo=bar;baz means the parameter foo will have the value bar;baz; whereas ?foo=bar;baz=sna should result in foo being bar;baz=sna (although technically illegal since the second = should be escaped to %3D).

geira
  • 604
  • 5
  • 5
  • 10
    This answer is misleading because it is strictly talking about form encoding which is not what the OP is asking about nor was in the included example. Form url encoding is very old and is used when sending data through the
    tag which we are moving away from and now towards AJAX. The use of & as a delimiter was an old unfortunate "mistake" that is now being preserved for backwards compatibility reasons. Using semicolons is the way forward provided that your web server supports it.
    – Zectbumo Nov 14 '17 at 15:46
  • 7
    If you read the HTTP and URL standards you will see they do not define any syntax for the query string, apart from escaping. In fact the two docs mentioned are the only specifications for query params in existence. While you are technically correct that form encoding (which both of the W3C Recommendations describe) relates to POST requests, there is no similar specification for GET and so browser implementations have followed the former. Modern frameworks (e.g. Mojolicious) are also dropping semicolon separator support, and unless all browsers are rewritten ampersands will never disappear. – geira Nov 15 '17 at 19:11
  • 2
    As for moving towards AJAX, take not that the current [Swagger](https://swagger.io/docs/specification/describing-parameters/) (a.k.a. OpenAPI) standard only allows for ampersand-delimited parameters; semicolons are only permitted as path or cookie parameters. If you design an API that contradicts the Swagger spec you have a problem. – geira Nov 15 '17 at 19:22
  • 1
    Of course the specs don't define delimiters. It is up to us to make our own smart decisions to use `;` to separate our parameters so we don't have to escape parameters commonly found in our URLs placed in html attributes. We also can shoot ourselves in the foot and use `&` and be left with escaping in HTML attributes. I don't blame Swagger. After all, they want their service to work across as many servers as possible so they went with the weakest common denominator. So, if your web server supports semicolons and you are writing your own URLs then be smarter than the rest: use semicolons. – Zectbumo Nov 16 '17 at 22:24
  • I am stuck in browser compatibility issue, where my s3 image link requires a parameter `X-Amz-SignedHeaders: content-type;host` and it works on chrome/firefox and latest safari browsers but fails on Microsoft edge and IE 11, any suggestion on how I can fix this – Savitoj Cheema Feb 12 '19 at 08:54
  • I'd like to add that `;` is part of `sub-delims` as per [RFC 3986, section 2.2](https://tools.ietf.org/html/rfc3986#section-2.2) which is part of `query` per RFC 3986, section 3.4. And `query` is how the query-string part is defined in [RFC 7230, section 5.3.1](https://tools.ietf.org/html/rfc7230#section-5.3.1). RFC 3986 says that subdelimiters are to be used by apps for separating parts and when somebody wants to pass these values they must urlencode them. – webknjaz -- Слава Україні Jun 07 '20 at 16:20
19

As long as your HTTP server, and your server-side application, accept semicolons as separators, you should be good to go. I cannot see any drawbacks. As you said, the W3C spec is on your side:

We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner.

Daniel Vassallo
  • 337,827
  • 72
  • 505
  • 443
  • 1
    is see at least one drawback - from a client viewpoint, that i can't safely decide to use `;` instead of `&` in the request (ok, i'm adding the mention on the client point of view to the question) – mykhal Aug 14 '10 at 02:07
  • @mykhal: "From a client viewpoint"... you mean when you're exposing an API over a web service, or similar? Because otherwise I think end users using a site through a web browser shouldn't care. Regarding the former, yes, web service consumers might be more used to use an `&` and might feel puzzled by the unusual convention. – Daniel Vassallo Aug 14 '10 at 02:10
  • @[Daniel Vassallo] i mean, generally. btw, i was implicitly addressing exactly the same W3C quotation you are mentioning in your answer, which therefore is not satisfying for me.. never mind :) – mykhal Aug 14 '10 at 02:24
  • @mykhal: Yes, I was aware of that :) ... Nevertheless, I don't think there are any issues on the development side (apart from having to check that your framework and web server support semicolons)... Interesting question, btw :) – Daniel Vassallo Aug 14 '10 at 02:31
  • 19
    There are drawbacks. By giving ";" special additional meaning not originally specified in the RFC, you force ";" to be escaped in both key and value text. For example, `?q='one;two'&x=1`. You'd expect `{"q": "'one;two'", "x": "1"}`, but might very well end up with: `{"q": "'one", "two'": null, "x": "1"}` or some other value. There's a lot of potential ambiguity there. Basically, the W3C is stupid. – Bob Aman Apr 10 '13 at 23:28
  • 1
    [What do you do](http://stackoverflow.com/questions/20910273/is-there-an-alternative-to-parse-qs-that-handles-semi-colons) when testing against [an API that uses semicolons as delimiters like the StackExchange API](http://api.stackexchange.com/docs/search)? – Kyle Kelley Jan 03 '14 at 18:25
  • Except, @BobAman you neglect to consider that your case is equally valid for those who think ; is better (like myself). in normal English, ampersand is more common than ; so using '&' in a querystring value and using &'s as separators is *more* problematic (already) than making the switch to use semicolons instead. – Shawn Kovac Jan 25 '16 at 17:56
  • @BobAman I have to disagree. Forcing ; to be escaped is no worse than the current use of '&'. One must already escape '&' in both key and value text. So why is it a drawback to force the ';' to be escaped if one is separating on semicolons instead? In other words, whether you use one character or another, you will always need to escape the character that you choose to be your separator. I see the difference is that '&' is more common in normal English, so I think '&' is *already* the more problematic character due to its use in English. Basically, the W3c made the right call in this choice. – Edward Jan 25 '16 at 19:08
  • @Edward It's a few years later, so it's always worth revisiting, but nope. W3C was still wrong. I wrote the most popular URL parser for Ruby, so I see a lot of URL parsing edge cases come across my desk, and I can say pretty categorically that in practice, the points at which two different specs interact is the biggest potential source of interop problems. Doubly so when it's two different specs from two different organizations, as is the case here (IETF vs W3C). This one is up there with '+' being a space character (according to W3C) but '%20' also being a space character (according to IETF). – Bob Aman Feb 02 '16 at 20:07
  • When in doubt, the IETF produces smarter, less problematic specs. If two different specs say to do two different conflicting things, choose whatever the IETF spec says to do. – Bob Aman Feb 02 '16 at 20:08
  • @ShawnKovac That's not how specifications work though. The point of a specification is interoperability without requiring both parties to do a lot of out-of-band coordination. If you prefer ';', that's cool, as long as you only need to talk to yourself. But that's boring. Interop is why the web won. Sabotaging it for the sake of a few characters worth of convenience is 100% not worth it, especially when the problem is trivially solved with tooling. – Bob Aman Feb 02 '16 at 20:29
  • @BobAman: A few years later i revisit this thread. I agree that '&' is more popular, as you've expressed that advantage of using '&'. My view is that semicolon is a better pick for _everyone_ to use than ampersand. However, you are correct that everyone isn't using ';' and the usage is divided. I agree that two common alternatives makes another hassle. However, there are people who just want to go with what is most popular in hopes that the most popular one wins. And then there are those who go with what seems best if everyone used it. You are I seem to have different preferences for such. – Shawn Kovac Aug 21 '19 at 12:04
  • @BobAman: It seems that you prefer to use what is most popular. I clearly am a person who chooses to think outside-the-box & use what makes more sense if everyone used my methods. And i differ with the common practice a LOT. I type with a Dvorak keyboard layout. I'm a barefoot runner. I prefer dozenal numbers over decimal numbers, even for keeping time in my calendar. But to each his own. You probly think i'm a bit too altruistic, which is fine. Fortunately, this topic isn't a big deal & we just have to cope & debug regardless of which separator one prefers. Our answer is: don't cut corners. – Shawn Kovac Aug 21 '19 at 12:11
10

I agree with Bob Aman. The W3C spec is designed to make it easier to use anchor hyperlinks with URLs that look like form GET requests (e.g., http://www.host.com/?x=1&y=2). In this context, the ampersand conflicts with the system for character entity references, which all start with an ampersand (e.g., "). So W3C recommends that web servers allow a semicolon to be used as a field separator instead of an ampersand, to make it easier to write these URLs. But this solution requires that writers remember that the ampersand must be replaced by something, and that a ; is an equally valid field delimiter, even though web browsers universally use ampersands in the URL when submitting forms. That is arguably more difficult that remembering to replace the ampersand with an & in these links, just as would be done elsewhere in the document.

To make matters worse, until all web servers allow semicolons as field delimiters, URL writers can only use this shortcut for some hosts, and must use & for others. They will also have to change their code later if a given host stops allowing semicolon delimiters. This is certainly harder than simply using &, which will work for every server forever. This in turn removes any incentive for web servers to allow semicolons as field separators. Why bother, when everyone is already changing the ampersand to & instead of ;?

Matthias Fripp
  • 17,670
  • 5
  • 28
  • 45
  • i say it is *harder* to continue to even only use the & without allowing both. i say allowing people who want a simpler life to use the ; will make it that much easier for them that it's worth the relatively little more complication that sometimes some sites need to know both options. – Shawn Kovac Jan 25 '16 at 17:49
  • handling QueryStrings with the & separator is more than twice as complicated than switching to ; to separate QueryString items. Using ; vastly reduces potential bugs for improperly HTML endoced strings for '&' use. – Shawn Kovac Jan 25 '16 at 17:50
  • I think i hear Matthias saying that using '&' as separators is better simply because they are more popular already. And i say, that is a good point. And i'm not speaking against that. What i am trying to communicate is that if we _all_ start using ';' instead, it is easier for _most_ people in the long run. I'm saying that ';' is better for _all_ to use than '&' is. And i'm also saying that until all switch to one or the other, then we are just going to have to deal with one group who does it differently, so if we want robust code, we need to be able to handle both, regardless. – Shawn Kovac Aug 21 '19 at 12:18
4

In short, HTML is a big mess (due to its leniency), and using semicolons help to simplify this a LOT. I estimate that when i factor in the complications that i've found, using ampersands as a separator makes the whole process about three times as complicated as using semicolons for separators instead!

I'm a .NET programmer and to my knowledge, .NET does not inherently allow ';' separators, so i wrote my own parsing and handling methods because i saw a tremendous value in using semicolons rather than the already problematic system of using ampersands as separators. Unfortunately, very respectable people (like @Bob Aman in another answer) do not see the value in why semicolon usage is far superior and so much simpler than using ampersands. So i now share a few points to perhaps persuade other respectable developers who don't recognize the value yet of using semicolons instead:

Using a querystring like '?a=1&b=2' in an HTML page is improper (without HTML encoding it first), but most of the time it works. This however is only due to most browsers being tolerant, and that tolerance can lead to hard-to-find bugs when, for instance, the value of the key value pair gets posted in an HTML page URL without proper encoding (directly as '?a=1&b=2' in the HTML source). A QueryString like '?who=me+&+you' is problematic too.

We people can have biases and can disagree about our biases all day long, so recognizing our biases is very important. For instance, i agree that i just think separating with ';' looks 'cleaner'. I agree that my 'cleaner' opinion is purely a bias. And another developer can have an equally opposite and equally valid bias. So my bias on this one point is not any more correct than the opposite bias.

But given the unbiased support of the semicolon making everyone's life easier in the long run, cannot be correctly disputed when the whole picture is taken into account. In short, using semicolons does make life simpler for everyone, with one exception: a small hurdle of getting used to something new. That's all. It's always more difficult to make anything change. But the difficulty of making the change pales in comparison to the continued difficulty of continuing to use &.

Using ; as a QueryString separator makes it MUCH simpler. Ampersand separators are more than twice as difficult to code properly than if semicolons were used. (I think) most implementations are not coded properly, so most implementations aren't twice as complicated. But then tracking down and fixing the bugs leads to lost productivity. Here, i point out 2 separate encoding steps needed to properly encode a QueryString when & is the separator:

  • Step 1: URL encode both the keys and values of the querystring.
  • Step 2: Concatenate the keys and values like 'a=1&b=2' after they are URL encoded from step 1.
  • Step 3: Then HTML encode the whole QueryString in the HTML source of the page.

So special encoding must be done twice for proper (bug free) URL encoding, and not just that, but the encodings are two distinct, different encoding types. The first is a URL encoding and the second is an HTML encoding (for HTML source code). If any of these is incorrect, then i can find you a bug. But step 3 is different for XML. For XML, then XML character entity encoding is needed instead (which is almost identical). My point is that the last encoding is dependent upon the context of the URL, whether that be in an HTML web page, or in XML documentation.

Now with the much simpler semicolon separators, the process is as one wud expect:

  • 1: URL encode the keys and values,
  • 2: concatenate the values together. (With no encoding for step 3.)

I think most web developers skip step 3 because browsers are so lenient. But this leads to bugs and more complications when hunting down those bugs or users not being able to do things if those bugs were not present, or writing bug reports, etc.

Another complication in real use is when writing XML documentation markup in my source code in both C# and VB.NET. Since & must be encoded, it's a real drag, literally, on my productivity. That extra step 3 makes it harder to read the source code too. So this harder-to-read deficit applies not only to HTML and XML, but also to other applications like C# and VB.NET code because their documentation uses XML documentation. So the step #3 encoding complication proliferates to other applications too.

So in summary, using the ; as a separator is simple because the (correct) process when using the semicolon is how one wud normally expect the process to be: only one step of encoding needs to take place.

Perhaps this wasn't too confusing. But all the confusion or difficulty is due to using a separation character that shud be HTML encoded. Thus '&' is the culprit. And semicolon relieves all that complication.

(I will point out that my 3 step vs 2 step process above is usually how many steps it would take for most applications. However, for completely robust code, all 3 steps are needed no matter which separator is used. But in my experience, most implementations are sloppy and not robust. So using semicolon as the querystring separator would make life easier for more people with less website and interop bugs, if everyone adopted the semicolon as the default instead of the ampersand.)

Shawn Kovac
  • 1,425
  • 15
  • 17
  • 1
    So, to a certain extent, the W3C's hands were tied by virtue of the inheritance from SGML entity reference syntax and the fact that URL syntax was similarly already defined elsewhere. However, redefining the behavior of a specification outside of that specification is a worst-practice for effective interop. Let's say I'm a spec implementor. I read through the spec, and implement it precisely and perfectly. Ideally, I ought to be able to interop with anyone else who has also done the same. But as soon as one of us incorporates the additional rules, no more interop. That's why W3C is wrong. – Bob Aman Feb 02 '16 at 20:20
  • Also, FWIW, XML in source code comments is pretty dumb too. That one's not on the W3C though. – Bob Aman Feb 02 '16 at 20:31
  • 1
    @BobAman you claim 'as soon as one of us incorporates the additional rules, no more interop.' But this is not the truth. That's like saying if your server uses POP3 and my server only uses IMAP that there's no more interop, so whoever wrote IMAP was wrong. Dude, it's called adding to technology with a better replacement. The solution to the IMAP issue is the same solution to the ; separator in URLs: be aware of both, and use the one that the server uses. No confusion. You are making it harder than it is. Old technologies get outdated by new standards. This is one of them. – Shawn Kovac Feb 04 '16 at 22:05
  • So Bob, i ask you how is there any lack of interoperability? a person is limited to using *only* the separator that the server itself uses, no matter which character that the webserver uses. The beauty of ; is that there are several advantages over using ampersand: the ampersand needs extra encoding which hardly ever gets done in reality, which i explained in my reply. So i don't see even one way that ; is inferior to using ampersand, except that some servers are lagging in implementation for the newer better option. it never amazes me how so many people reject something only because it's new. – Shawn Kovac Feb 04 '16 at 22:20
  • 1
    You seem to be confused over what interop entails. Standards bodies generally require at least two interoperable implementations written by different parties. It's not interop if the client and the server are written by the same people. "Choosing the same separator character as the server" is not interop at all. The whole point of a specification is that I should know exactly how to interpret a piece of data based on the rules given in the specification. If I need to know that you do or do not support a different separator character, that's 'out-of-band' and it's not really interop anymore. – Bob Aman Feb 09 '16 at 00:06
  • @BobAman, I quite disagree with your conclusion. Following your logic, how about i use this *which has your same logic*. Interop is not interop if it uses hexadecimal colors, because i have to know the other party is using hexadecimal instead of decimal. Bob, *everything*, all the info, is *part* of the specification, whether it uses binary, hexadecimal, decimal, or even better yet, dozenal numbers, whether URL separators use a better choice of semicolons or the archaic ampersands, whether UTF-8 is used or another character set. All these are part of the interop specs. – Shawn Kovac Feb 19 '16 at 11:16
  • But i do see your point, that when you need 2 servers to communicate with one another, we need to know which URL separator character that server is using, and this 'shouldn't need to be done'. But all testing *shouldn't* need to be done in your theoretical sense. The reality of our IT industry is that thoro testing is *always* needed. If we skip it we can count on finding bugs later. Testing this separator between two servers is the practical solution. It's not any failed interop any more than choosing between inches and centimeters in construction in a project. Just different 'languages'. – Shawn Kovac Feb 19 '16 at 11:25
  • My whole point is that in choosing whether & or ; is used, the semicolon has many more advantages. But the & has the popularity, like decimal had the popularity before binary and hex. But there were reasons to now use each. In CSS for instance, we can specify colors using decimal or hex. This is *convenient*. it gives us more options. Ampersand QueryString separator has bugs because *most* implementations are not correctly programmed, and most people don't know it! But it will bite most of them later. Using ampersand, one MUST test thoroughly! :( Semicolon separator needs less testing for all. – Shawn Kovac Feb 19 '16 at 11:32
  • Why not use question mark not only to introduce a query, but also to separate each assignment in the query? That reduces the characters that need to be escaped if they are to appear in an assignment by one, and makes parsing the query, when needed, easier. – David Spector Oct 09 '18 at 12:57
  • @DavidSpector, I can't say for certain, but i would guess that the question mark is used more commonly than the semicolon. One needs a character that is not common. Less commonly used is the key. Between ampersand, semicolon and question mark, i believe semicolon is less used in English text. This is the biggest reason why semicolon would result in less problems. But any of the three may be used without any buggy system, *IF* developers would just test thoroughly. But developers just don't test thoroughly and they are not careful, as a whole. Thus the reason why a more rare char is best. – Shawn Kovac Aug 21 '19 at 11:17
  • I stopped reading when I got to the second occurrence of "wud". Really? – Nicholas Shanks Nov 12 '20 at 12:27