165

I am trying to parse url-encoded strings that are made up of key=value pairs separated by either & or &.

The following will only match the first occurrence, breaking apart the keys and values into separate result elements:

var result = mystring.match(/(?:&|&)?([^=]+)=([^&]+)/)

The results for the string '1111342=Adam%20Franco&348572=Bob%20Jones' would be:

['1111342', 'Adam%20Franco']

Using the global flag, 'g', will match all occurrences, but only return the fully matched sub-strings, not the separated keys and values:

var result = mystring.match(/(?:&|&)?([^=]+)=([^&]+)/g)

The results for the string '1111342=Adam%20Franco&348572=Bob%20Jones' would be:

['1111342=Adam%20Franco', '&348572=Bob%20Jones']

While I could split the string on & and break apart each key/value pair individually, is there any way using JavaScript's regular expression support to match multiple occurrences of the pattern /(?:&|&)?([^=]+)=([^&]+)/ similar to PHP's preg_match_all() function?

I'm aiming for some way to get results with the sub-matches separated like:

[['1111342', '348572'], ['Adam%20Franco', 'Bob%20Jones']]

or

[['1111342', 'Adam%20Franco'], ['348572', 'Bob%20Jones']]
Adam Franco
  • 81,148
  • 4
  • 36
  • 39
  • 9
    it's a little odd that no one recommended using `replace` here. `var data = {}; mystring.replace(/(?:&|&)?([^=]+)=([^&]+)/g, function(a,b,c,d) { data[c] = d; });` done. "matchAll" in JavaScript is "replace" with a replacement handler function instead of a string. – Mike 'Pomax' Kamermans Feb 11 '14 at 19:21
  • Note that for those still finding this question in 2020, the answer is "don't use regex, use [URLSearchParams](https://developer.mozilla.org/en-US/docs/Web/API/URLSearchParams), which does all of this for you." – Mike 'Pomax' Kamermans Jan 23 '20 at 22:19

15 Answers15

170

Hoisted from the comments

2020 comment: rather than using regex, we now have URLSearchParams, which does all of this for us, so no custom code, let alone regex, are necessary anymore.

Mike 'Pomax' Kamermans

Browser support is listed here https://caniuse.com/#feat=urlsearchparams


I would suggest an alternative regex, using sub-groups to capture name and value of the parameters individually and re.exec():

function getUrlParams(url) {
  var re = /(?:\?|&(?:amp;)?)([^=&#]+)(?:=?([^&#]*))/g,
      match, params = {},
      decode = function (s) {return decodeURIComponent(s.replace(/\+/g, " "));};

  if (typeof url == "undefined") url = document.location.href;

  while (match = re.exec(url)) {
    params[decode(match[1])] = decode(match[2]);
  }
  return params;
}

var result = getUrlParams("http://maps.google.de/maps?f=q&source=s_q&hl=de&geocode=&q=Frankfurt+am+Main&sll=50.106047,8.679886&sspn=0.370369,0.833588&ie=UTF8&ll=50.116616,8.680573&spn=0.35972,0.833588&z=11&iwloc=addr");

result is an object:

{
  f: "q"
  geocode: ""
  hl: "de"
  ie: "UTF8"
  iwloc: "addr"
  ll: "50.116616,8.680573"
  q: "Frankfurt am Main"
  sll: "50.106047,8.679886"
  source: "s_q"
  spn: "0.35972,0.833588"
  sspn: "0.370369,0.833588"
  z: "11"
}

The regex breaks down as follows:

(?:            # non-capturing group
  \?|&         #   "?" or "&"
  (?:amp;)?    #   (allow "&", for wrongly HTML-encoded URLs)
)              # end non-capturing group
(              # group 1
  [^=]+      #   any character except "=", "&" or "#"; at least once
)              # end group 1 - this will be the parameter's name
(?:            # non-capturing group
  =?           #   an "=", optional
  (            #   group 2
    [^]*     #     any character except "&" or "#"; any number of times
  )            #   end group 2 - this will be the parameter's value
)              # end non-capturing group
Klesun
  • 12,280
  • 5
  • 59
  • 52
Tomalak
  • 332,285
  • 67
  • 532
  • 628
  • 24
    This is what I was hoping for. What I've never seen in JavaScript documentation is mention that the exec() method will continue to return the next result set if called more than once. Thanks again for the great tip! – Adam Franco Feb 06 '09 at 16:44
  • 1
    It does because of this: http://www.regular-expressions.info/javascript.html (Read through: "How to Use The JavaScript RegExp Object") – Tomalak Feb 06 '09 at 16:54
  • 1
    there is a bug in this code: the semicolon after the "while" should be removed. – Jan Willem B Feb 23 '10 at 16:31
  • why did you need the non-capturing groups? – Jürgen Paul Dec 04 '13 at 12:03
  • 1
    Because I generally only use normal (i.e. capturing) groups if I'm actually interested in their content. – Tomalak Dec 04 '13 at 13:13
  • Something I noticed about the while loop is that it's setting match and not comparing where the condition is suppose to be. Was that intentional? – Knight Yoshi Sep 24 '14 at 12:22
  • 1
    @KnightYoshi Yes. In JavaScript any expression also produces its own result (like `x = y` would assign `y` to `x` and also produce `y`). When we apply that knowledge to `if (match = re.exec(url))`: This A) does the assignment *and* B) returns the result of `re.exec(url)` to the `while`. Now `re.exec` returns `null` if there is no match, which is a falsy value. So in effect the loop will continue as long as there is a match. – Tomalak Sep 24 '14 at 12:34
  • Alright, that makes sense. I ran something I am working on through JSLint with this in and it complained about it. For validation purposes, wouldn't it be better to assign re.exec(url) to the variable just before the while loop and then check it or would that yield different results - where in the while loop if it's true, it would have to re-assign the variable to check again for the next loop? – Knight Yoshi Sep 24 '14 at 16:32
  • I know. jsLint does not like this for its non-obviousness, i.e. for style reasons, not for technical reasons. This behavior is defined in the spec and every JS implementation will do it correctly, arguably it's just not *super* clean. Your observation is correct, if you don't do the assignment in the `while` test, you'd have to do it once before the loop and once inside the loop body. However, since many iterators in JavaScript (and `re.exec()` effectively is an iterator) return falsy values when they reach the end of the list, this particular `while` usage pattern is widely used & recognized. – Tomalak Sep 24 '14 at 16:42
  • Don't forget the `/g` at the end, otherwise it may loop infinitely. – TWiStErRob Sep 24 '16 at 09:12
  • What makes good people use commas instead of semicolons? `var re = /(?:\?|&(?:amp;)?)([^=]+)(?:=?([^]*))/g, match, params = {}, decode = function (s) {return decodeURIComponent(s.replace(/\+/g, " "));};`. If you want anonymous scopes, why not explicitly create an anonymous scope, instead of creating run-on sentences? – Dmytro Oct 22 '16 at 00:04
  • I have no idea what you are talking about. – Tomalak Oct 22 '16 at 03:42
  • Note that you can achieve the same "match everything one by one" [by using `replace` rather than `exec`](/a/21711130/740553), which can take a function handler does processing capture groups, *immensely* simplifying this whole process to an easy to read three line solution. – Mike 'Pomax' Kamermans Feb 15 '17 at 20:04
  • 1
    It's important to note that the regular expression _must_ have the global argument (`/g`), otherwise the regex object (`re`) doesn't track where the last match ended. So if you're missing `/g`, you'll have an endless loop. – MatsLindh Oct 12 '17 at 06:39
  • 1
    2020 comment: rather than using regex, we now have [URLSearchParams](https://developer.mozilla.org/en-US/docs/Web/API/URLSearchParams), which does all of this for us, so no custom code, let alone regex, are necessary anymore. – Mike 'Pomax' Kamermans Jan 23 '20 at 22:13
  • 1
    @Mike Thanks for pointing that out, I have put it on top of the answer. – Tomalak Jan 23 '20 at 22:19
68

You need to use the 'g' switch for a global search

var result = mystring.match(/(&|&)?([^=]+)=([^&]+)/g)
meouw
  • 41,754
  • 10
  • 52
  • 69
  • 36
    This doesn't actually solve the problem: "Using the global flag, 'g', will match all occurrences, but only return the fully matched sub-strings, not the separated keys and values." – Adam Franco Apr 22 '15 at 02:21
40

2020 edit

Use URLSearchParams, as this job no longer requires any kind of custom code. Browsers can do this for you with a single constructor:

const str = "1111342=Adam%20Franco&348572=Bob%20Jones";
const data = new URLSearchParams(str);
for (pair of data) console.log(pair)

yields

Array [ "1111342", "Adam Franco" ]
Array [ "348572", "Bob Jones" ]

So there is no reason to use regex for this anymore.

Original answer

If you don't want to rely on the "blind matching" that comes with running exec style matching, JavaScript does come with match-all functionality built in, but it's part of the replace function call, when using a "what to do with the capture groups" handling function:

var data = {};

var getKeyValue = function(fullPattern, group1, group2, group3) {
  data[group2] = group3;
};

mystring.replace(/(?:&|&)?([^=]+)=([^&]+)/g, getKeyValue);

done.

Instead of using the capture group handling function to actually return replacement strings (for replace handling, the first arg is the full pattern match, and subsequent args are individual capture groups) we simply take the groups 2 and 3 captures, and cache that pair.

So, rather than writing complicated parsing functions, remember that the "matchAll" function in JavaScript is simply "replace" with a replacement handler function, and much pattern matching efficiency can be had.

Mike 'Pomax' Kamermans
  • 49,297
  • 16
  • 112
  • 153
  • I have a string `something "this one" and "that one"`. I want to place all of the double quoted strings in a list i.e. [this one, that one]. So far `mystring.match(/"(.*?)"/)` works fine at detecting the first one, but I do not know how to adapt your solution for a single capturing group. – nu everest Oct 03 '14 at 01:42
  • 2
    sounds like you should post a question on Stackoverflow for that, rather than trying to solve it in comments. – Mike 'Pomax' Kamermans Oct 03 '14 at 03:07
  • I've created a new question: http://stackoverflow.com/questions/26174122/javascript-match-all-regex-that-returns-an-array-of-all-phrases-wrapped-in-doubl – nu everest Oct 03 '14 at 06:20
  • 1
    Not sure why this answer has so few upvotes but it is the best answer to the question. – Calin Apr 21 '15 at 20:14
  • Hi @Mike'Pomax'Kamermans, the community guide-lines specifically recommend editing entries to improve them, see: http://stackoverflow.com/help/behavior . The core of your answer is exceedingly helpful, but I found the language "remember that matchAll is replace" wasn't clear and wasn't an explanation of why your code (which is non-obvious) works. I thought you should get the well-deserved rep, so I edited your answer rather than duplicating it with improved text. As the original asker of this question, I'm happy to revert the acceptance - of this answer (and the edit) if you still want me to. – Adam Franco Apr 22 '15 at 12:21
  • wow, completely missed you were the OP =P I would like the edit undone though. I'd happily rewrite it but this is basically no longer "my" answer, but "your" answer, which makes any points gotten from it "your" points, not mine, and that bothers me =) (fixing some phrasing or typos is great, but you basically redid *all* phrasing ;) – Mike 'Pomax' Kamermans Apr 22 '15 at 16:22
  • @Mike'Pomax'Kamermans , thanks for your reply. I've reverted [my edits](http://stackoverflow.com/revisions/21711130/2). I invite you to rewrite it with more explanation as to why it works. If you don't feel like doing that maybe I'll put in some time to create a new summary answer that ties together both your `replace` method and the `match`/`exec` method. Best, – Adam Franco Apr 22 '15 at 20:26
  • many thanks, and I updated the answer with a few details and some handy links. – Mike 'Pomax' Kamermans Apr 23 '15 at 06:28
21

For capturing groups, I'm used to using preg_match_all in PHP and I've tried to replicate it's functionality here:

<script>

// Return all pattern matches with captured groups
RegExp.prototype.execAll = function(string) {
    var match = null;
    var matches = new Array();
    while (match = this.exec(string)) {
        var matchArray = [];
        for (i in match) {
            if (parseInt(i) == i) {
                matchArray.push(match[i]);
            }
        }
        matches.push(matchArray);
    }
    return matches;
}

// Example
var someTxt = 'abc123 def456 ghi890';
var results = /[a-z]+(\d+)/g.execAll(someTxt);

// Output
[["abc123", "123"],
 ["def456", "456"],
 ["ghi890", "890"]]

</script>
Aram Kocharyan
  • 20,165
  • 11
  • 81
  • 96
  • 4
    @teh_senaus you need to specify the global modifier with `/g` otherwise running `exec()` won't change the current index and will loop forever. – Aram Kocharyan Jan 16 '14 at 03:14
  • If i call to validate this code myRe.test(str) and then try do execAll, it stars at second match and we lost the first match. – fdrv Mar 15 '16 at 04:34
  • @fdrv You have to reset the lastIndex to zero before starting the loop: this.lastIndex = 0; – C-F Jun 15 '17 at 04:13
14

Set the g modifier for a global match:

/…/g
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • 13
    This doesn't actually solve the problem: "Using the global flag, 'g', will match all occurrences, but only return the fully matched sub-strings, not the separated keys and values." – Adam Franco Apr 22 '15 at 02:21
12

Source:
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec

Finding successive matches

If your regular expression uses the "g" flag, you can use the exec() method multiple times to find successive matches in the same string. When you do so, the search starts at the substring of str specified by the regular expression's lastIndex property (test() will also advance the lastIndex property). For example, assume you have this script:

var myRe = /ab*/g;
var str = 'abbcdefabh';
var myArray;
while ((myArray = myRe.exec(str)) !== null) {
  var msg = 'Found ' + myArray[0] + '. ';
  msg += 'Next match starts at ' + myRe.lastIndex;
  console.log(msg);
}

This script displays the following text:

Found abb. Next match starts at 3
Found ab. Next match starts at 912

Note: Do not place the regular expression literal (or RegExp constructor) within the while condition or it will create an infinite loop if there is a match due to the lastIndex property being reset upon each iteration. Also be sure that the global flag is set or a loop will occur here also.

randers
  • 5,031
  • 5
  • 37
  • 64
KIM Taegyoon
  • 1,917
  • 21
  • 18
  • If i call to validate this code myRe.test(str) and then try do while, it stars at second match and we lost the first match. – fdrv Mar 15 '16 at 04:33
  • You can also combine `String.prototype.match` with the `g` flag: `'abbcdefabh'.match(/ab*/g)` returns `['abb', 'ab']` – thom_nic Nov 24 '16 at 04:00
4

Hеllo from 2020. Let me bring String.prototype.matchAll() to your attention:

let regexp = /(?:&|&amp;)?([^=]+)=([^&]+)/g;
let str = '1111342=Adam%20Franco&348572=Bob%20Jones';

for (let match of str.matchAll(regexp)) {
    let [full, key, value] = match;
    console.log(key + ' => ' + value);
}

Outputs:

1111342 => Adam%20Franco
348572 => Bob%20Jones
Klesun
  • 12,280
  • 5
  • 59
  • 52
  • Finally! A note of caution: ["ECMAScript 2020, the 11th edition, introduces the matchAll method for Strings, to produce an iterator for all match objects generated by a global regular expression"](https://tc39.es/ecma262/#sec-string.prototype.matchall). According to the site linked in the answer, most browsers & nodeJS support it currently, but not IE, Safari, or Samsung Internet. Hopefully support will broaden soon, but YMMV for a while. – Adam Franco May 19 '20 at 13:24
2

If someone (like me) needs Tomalak's method with array support (ie. multiple select), here it is:

function getUrlParams(url) {
  var re = /(?:\?|&(?:amp;)?)([^=&#]+)(?:=?([^&#]*))/g,
      match, params = {},
      decode = function (s) {return decodeURIComponent(s.replace(/\+/g, " "));};

  if (typeof url == "undefined") url = document.location.href;

  while (match = re.exec(url)) {
    if( params[decode(match[1])] ) {
        if( typeof params[decode(match[1])] != 'object' ) {
            params[decode(match[1])] = new Array( params[decode(match[1])], decode(match[2]) );
        } else {
            params[decode(match[1])].push(decode(match[2]));
        }
    }
    else
        params[decode(match[1])] = decode(match[2]);
  }
  return params;
}
var urlParams = getUrlParams(location.search);

input ?my=1&my=2&my=things

result 1,2,things (earlier returned only: things)

fedu
  • 138
  • 1
  • 8
1

Just to stick with the proposed question as indicated by the title, you can actually iterate over each match in a string using String.prototype.replace(). For example the following does just that to get an array of all words based on a regular expression:

function getWords(str) {
  var arr = [];
  str.replace(/\w+/g, function(m) {
    arr.push(m);
  });
  return arr;
}

var words = getWords("Where in the world is Carmen Sandiego?");
// > ["Where", "in", "the", "world", "is", "Carmen", "Sandiego"]

If I wanted to get capture groups or even the index of each match I could do that too. The following shows how each match is returned with the entire match, the 1st capture group and the index:

function getWords(str) {
  var arr = [];
  str.replace(/\w+(?=(.*))/g, function(m, remaining, index) {
    arr.push({ match: m, remainder: remaining, index: index });
  });
  return arr;
}

var words = getWords("Where in the world is Carmen Sandiego?");

After running the above, words will be as follows:

[
  {
    "match": "Where",
    "remainder": " in the world is Carmen Sandiego?",
    "index": 0
  },
  {
    "match": "in",
    "remainder": " the world is Carmen Sandiego?",
    "index": 6
  },
  {
    "match": "the",
    "remainder": " world is Carmen Sandiego?",
    "index": 9
  },
  {
    "match": "world",
    "remainder": " is Carmen Sandiego?",
    "index": 13
  },
  {
    "match": "is",
    "remainder": " Carmen Sandiego?",
    "index": 19
  },
  {
    "match": "Carmen",
    "remainder": " Sandiego?",
    "index": 22
  },
  {
    "match": "Sandiego",
    "remainder": "?",
    "index": 29
  }
]

In order to match multiple occurrences similar to what is available in PHP with preg_match_all you can use this type of thinking to make your own or use something like YourJS.matchAll(). YourJS more or less defines this function as follows:

function matchAll(str, rgx) {
  var arr, extras, matches = [];
  str.replace(rgx.global ? rgx : new RegExp(rgx.source, (rgx + '').replace(/[\s\S]+\//g , 'g')), function() {
    matches.push(arr = [].slice.call(arguments));
    extras = arr.splice(-2);
    arr.index = extras[0];
    arr.input = extras[1];
  });
  return matches[0] ? matches : null;
}
Chris West
  • 885
  • 8
  • 17
  • Since you want to parse the query string of a URL, you could also use something like `YourJS.parseQS()` (http://yourjs.com/snippets/56), although a lot of other libraries also offer this functionality. – Chris West Dec 05 '15 at 20:43
  • Modifying a variable from an outer scope in a loop that is supposed to return a replacement is kind of bad. Your misusing replace here – Ruan Mendes Jan 15 '20 at 12:54
1

If you can get away with using map this is a four-line-solution:

var mystring = '1111342=Adam%20Franco&348572=Bob%20Jones';

var result = mystring.match(/(&|&amp;)?([^=]+)=([^&]+)/g) || [];
result = result.map(function(i) {
  return i.match(/(&|&amp;)?([^=]+)=([^&]+)/);
});

console.log(result);

Ain't pretty, ain't efficient, but at least it is compact. ;)

fboes
  • 2,149
  • 16
  • 17
1

Use window.URL:

> s = 'http://www.example.com/index.html?1111342=Adam%20Franco&348572=Bob%20Jones'
> u = new URL(s)
> Array.from(u.searchParams.entries())
[["1111342", "Adam Franco"], ["348572", "Bob Jones"]]
jnnnnn
  • 3,889
  • 32
  • 37
0

Well... I had a similar problem... I want an incremental / step search with RegExp (eg: start search... do some processing... continue search until last match)

After lots of internet search... like always (this is turning an habit now) I end up in StackOverflow and found the answer...

Whats is not referred and matters to mention is "lastIndex" I now understand why the RegExp object implements the "lastIndex" property

p.s.w.g
  • 146,324
  • 30
  • 291
  • 331
ZEE
  • 2,931
  • 5
  • 35
  • 47
0

To capture several parameters using the same name, I modified the while loop in Tomalak's method like this:

  while (match = re.exec(url)) {
    var pName = decode(match[1]);
    var pValue = decode(match[2]);
    params[pName] ? params[pName].push(pValue) : params[pName] = [pValue];
  }

input: ?firstname=george&lastname=bush&firstname=bill&lastname=clinton

returns: {firstname : ["george", "bill"], lastname : ["bush", "clinton"]}

ivar
  • 31
  • 4
  • While I like your idea, it doesn't work nicely with single params, like for `?cinema=1234&film=12&film=34` I'd expect `{cinema: 1234, film: [12, 34]}`. Edited your answer to reflect this. – TWiStErRob Jul 15 '13 at 23:27
0

Splitting it looks like the best option in to me:

'1111342=Adam%20Franco&348572=Bob%20Jones'.split('&').map(x => x.match(/(?:&|&amp;)?([^=]+)=([^&]+)/))
pguardiario
  • 53,827
  • 19
  • 119
  • 159
0

To avoid regex hell you could find your first match, chop off a chunk then attempt to find the next one on the substring. In C# this looks something like this, sorry I've not ported it over to JavaScript for you.

        long count = 0;
        var remainder = data;
        Match match = null;
        do
        {
            match = _rgx.Match(remainder);
            if (match.Success)
            {
                count++;
                remainder = remainder.Substring(match.Index + 1, remainder.Length - (match.Index+1));
            }
        } while (match.Success);
        return count;
andrew pate
  • 3,833
  • 36
  • 28