2

I've found plenty of articles on StackOverflow that mention fast ways to parse query string's in Javascript, but what I haven't found is how to handle both & and ; as a delimiter.

Now, I assume, only one of those gets set as the delimiter for the query string. Take for instance, the following query string:

index.php?Param1=value&Param2=value;Param3=value

Which is the delimiter?

  • Is it &, or is it ;?
  • Is it the first recognized delimiting character, or is it & takes precedence over ;?
  • Are both treated as delimiters, and all three parameters from the example parsed?

In Javascript, I have historically been using the following function:

/**
 * Gets the value of a parameter from a URI.
 * @param href The full URI to search. Must at least start with a ? query string delimiter.
 * @param key  The key of the param for which to retrieve the value.
 * @param def  A default value to return in the case that the key cannot be found in the query string.
 * @return The value of the param if the key is found, otherwise returns def.
 */
function getURIParam(href, key, def) {
    if (arguments.length == 2) def = null;

    var qs = href.substring(href.indexOf('?') + 1);
    var s = qs.split('&');
    for (var k in s) {
        var s2 = s[k].split('=');
        if (s2[0] == key)
            return decodeURIComponent(s2[1]);
    }

    return def;
}

That just splits out each key/value pair based on &. It works great, so long as the delimiter is always &. As we know, however, that is not always the case, nor should be enforced, as the RFC allows for ; as well.

So, in order to handle both & and ; as a delimiter, should I first search for an indexOf("&") and if no occurrence is found, set the delimiter to ;?

What is the proper way of parsing a URL based on the rule that the delimiter can be either & or ;?

Here is the W3C Recommendation on server being able to handle both & and ;.

We recommend that HTTP server implementors, and in particular, CGI implementors support the use of ";" in place of "&" to save authors the trouble of escaping "&" characters in this manner.

Here is RFC 1738 which defines the URI.

crush
  • 16,713
  • 9
  • 59
  • 100
  • possible duplicate of [How can I get query string values?](http://stackoverflow.com/questions/901115/how-can-i-get-query-string-values) – VisioN Feb 11 '13 at 20:03
  • @VisioN No, because the accepted answer there does not handle the `;` delimiter as far as I can tell. – crush Feb 11 '13 at 20:06
  • 1
    Using semicolon as a delimiter has been obsolete for a long time. Are you still seeing them? – Barmar Feb 11 '13 at 20:06
  • It may be obsolete, but not removed. Yes, I'm still seeing them. For example, `OTRS` uses them. – crush Feb 11 '13 at 20:07
  • 1
    The [W3C working draft](http://www.w3.org/TR/url/#collect-url-parameters) states that `&` is the only delimiter for URL params. – Evan Davis Feb 11 '13 at 20:34
  • @Mathletics Thanks for that. So, it would seem there is conflicting information stemming from W3C, as they also recommend handling `;` as a delimiter? – crush Feb 11 '13 at 20:41
  • @crush it would seem that way, yes. The spec for building and collecting URLs uses only one, but implementors are still meant to check for both. Annoying, for sure. – Evan Davis Feb 11 '13 at 20:42
  • According to [RFC 3986](http://www.ietf.org/rfc/rfc3986.txt), there are at least 11 characters that are reserved for use in URI, and appear to be valid in the `searchpart` as well. Now, I'm even more confused! – crush Feb 11 '13 at 20:49

1 Answers1

1

You can use reg ex to split on two delimiters.

var s = qs.split(/[&;]/)

This will handle strings with both & and ;

Edit:

here is a post to split on one or the other given the first occurence of the character. Hopefully the URL contains the semicolon first but it might be safer to check for any occurrence of ; and just using that as the delimiter if present.

According to RFC3986 it is up to the scheme implementor to decide on how the usage of the URI reserved characters is to be used. Also According to RFC 1738, semi-colon is a reserved character in the search part of a URI, and as such, if present, is treated as a delimiter, unless encoded.The W3C recommends that server implementers allow both to allow developers to not have to escape the ampersand, therefore one or the other should be used.

I would intrepid your example of

index.php?Param1=value&Param2=value;Param3=value

as

["Param1=value&Param2=value","Param3=value"]
Community
  • 1
  • 1
  • Yes. However, is that what is intended by the spec, or is it simply one delimiter or the other is to be used? In my example above, for instance, is it supposed to be parsing Param1, Param2, and Param3, or simply Param1 and Param2 with Param3 interpreted as part of Param2 because the delimiter is set to `&`? – crush Feb 11 '13 at 20:13
  • @crush any time it matches either of those characters, it will split. – Evan Davis Feb 11 '13 at 20:18
  • @Mathletics I understand what his code does. I'm asking what the spec requires. – crush Feb 11 '13 at 20:19
  • @crush the spec for what? URL parameterization, or the `split` method? – Evan Davis Feb 11 '13 at 20:21
  • @Mathletics The specification and recommendation by W3C to handle both `&` and `;`. It is not clear if you are so supposed to split on both or treat the first occurrence of one as the delimiter from that point forward. For example the case: `index.php?Param1=value&Param2=value;Param3=value`. Does the specification mean this is three parameters, or is it only 2 parameters, the second parameter being `Param2=value;Param3=value`. – crush Feb 11 '13 at 20:23
  • I'll accept this as the answer, but please amend it to mention the following: `According to RFC 1738, semi-colon is a reserved character in the search part of a URI, and as such, if present, is treated as a delimiter, unless encoded.` Link to RFC 1738: http://www.ietf.org/rfc/rfc1738.txt – crush Feb 11 '13 at 20:29
  • @Austin Why that instead of `["Param1=value", "Param2=value;Param3=value"]`? – crush Feb 11 '13 at 20:45
  • It's a bit ambiguous on what to do in the presence of multiple ampersands and semicolons. But since RFC 1738 states that ; should be the delimiter when present and the W3C says it's to make it so that developers do not have to escape &, that is how i would interpret it. –  Feb 11 '13 at 20:48
  • @Austin That is a fair point. Thanks, I'll take that into consideration. – crush Feb 11 '13 at 20:50