238

I'm trying to parse the following kind of string:

[key:"val" key2:"val2"]

where there are arbitrary key:"val" pairs inside. I want to grab the key name and the value. For those curious I'm trying to parse the database format of task warrior.

Here is my test string:

[description:"aoeu" uuid:"123sth"]

which is meant to highlight that anything can be in a key or value aside from space, no spaces around the colons, and values are always in double quotes.

In node, this is my output:

[deuteronomy][gatlin][~]$ node
> var re = /^\[(?:(.+?):"(.+?)"\s*)+\]$/g
> re.exec('[description:"aoeu" uuid:"123sth"]');
[ '[description:"aoeu" uuid:"123sth"]',
  'uuid',
  '123sth',
  index: 0,
  input: '[description:"aoeu" uuid:"123sth"]' ]

But description:"aoeu" also matches this pattern. How can I get all matches back?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
gatlin
  • 2,399
  • 2
  • 14
  • 6
  • 1
    It might be that my regex is wrong and / or that I am simply using the regex facilities in JavaScript incorrectly. This seems to work: > var s = "Fifteen is 15 and eight is 8"; > var re = /\d+/g; > var m = s.match(re); m = [ '15', '8' ] – gatlin Jun 12 '11 at 18:08
  • 8
    Javascript now has a .match() function: https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match Used like this: `"some string".match(/regex/g)` – Stefnotch Mar 05 '16 at 09:37

19 Answers19

280

Continue calling re.exec(s) in a loop to obtain all the matches:

var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';
var m;

do {
    m = re.exec(s);
    if (m) {
        console.log(m[1], m[2]);
    }
} while (m);

Try it with this JSFiddle: https://jsfiddle.net/7yS2V/

Zach Saucier
  • 24,871
  • 12
  • 85
  • 147
lawnsea
  • 6,463
  • 1
  • 24
  • 19
  • 10
    Why not `while` instead of `do … while`? – Gumbo Jun 12 '11 at 18:14
  • 16
    Using a while loop makes it slightly awkward to initialize m. You either have to write `while(m = re.exec(s))`, which is an anti-pattern IMO, or you have to write `m = re.exec(s); while (m) { ... m = re.exec(s); }`. I prefer the `do ... if ... while` idiom, but other techniques would work as well. – lawnsea Jun 12 '11 at 18:21
  • 19
    doing this in chromium resulted in my tab crashing. – EdgeCaseBerg Dec 16 '14 at 18:53
  • 52
    @EdgeCaseBerg You need to have the `g` flag set, otherwise the internal pointer is not moved forward. [Docs](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/RegExp/exec). – Tim Jul 25 '15 at 13:45
  • @lawnsea How about a for loop? –  Mar 22 '16 at 08:39
  • 4
    Also, the regex should be in a variable to make the pointer increment it's position. having a runtime regex would result in an infinite loop. – satin Aug 05 '16 at 21:53
  • 1
    Can't you just have the loop be `while(m = re.exec(a)) { console.log(m[1], m[2]); }` ? – Jenna Sloan Dec 28 '16 at 08:24
  • 13
    Another point is that if the regex can match empty string it will be an infinite loop – FabioCosta Jun 08 '17 at 00:57
  • 2
    @lawnsea please update your answer to: `let m; while (m = re.exec(s)) { console.log(m[1], m[2]); }` – Offenso Dec 26 '18 at 15:55
  • 2
    @FabioCosta Just encountered the issue with infinite loop. It's so frustrating that to avoid this bug everyone should manually check the equality of current and previous matches, and, if they are the same, increment `RegExp`s `lastIndex`. The answer should be definitely updated with this information. It's good that the new `matchAll` method for strings does this check by itself, and now we can refuse to use loops at all (thank @iuliu.net for the information). The last unpleasant point is that `RegExp`s contain state and thus should be copied manually to avoid undesirable mutations. – N. Kudryavtsev Nov 07 '19 at 10:41
238

str.match(pattern), if pattern has the global flag g, will return all the matches as an array.

For example:

const str = 'All of us except @Emran, @Raju and @Noman were there';
console.log(
  str.match(/@\w*/g)
);
// Will log ["@Emran", "@Raju", "@Noman"]
Lee Goddard
  • 10,680
  • 4
  • 46
  • 63
Anis
  • 3,349
  • 1
  • 21
  • 16
  • 24
    Beware: the matches aren't match objects, but the matching strings. For example, there is no access to the groups in `"All of us except @Emran:emran26, @Raju:raju13 and @Noman:noman42".match(/@(\w+):(\w+)/g)` (which will return `["@Emran:emran26", "@Raju:raju13", "@Noman:noman42"]`) – madprog Aug 18 '17 at 09:46
  • 4
    @madprog, Right, it's the easiest way but not suitable when the group values are essential. – Anis Sep 13 '17 at 10:02
  • 2
    This isn't working for me. I only get the first match. – Anthony Roberts Dec 31 '18 at 19:38
  • 13
    @AnthonyRoberts you must add the "g" flag. `/@\w/g` or `new RegExp("@\\w", "g")` – Aruna Herath Jan 28 '19 at 07:50
101

To loop through all matches, you can use the replace function:

var re = /\s*([^[:]+):\"([^"]+)"/g;
var s = '[description:"aoeu" uuid:"123sth"]';

s.replace(re, function(match, g1, g2) { console.log(g1, g2); });
mik01aj
  • 11,928
  • 15
  • 76
  • 119
Christophe
  • 27,383
  • 28
  • 97
  • 140
  • I think it is just too complicated. However, it is nice to know about different ways of doing a simple thing (I up-vote your answer). – Arashsoft May 12 '16 at 21:09
  • 27
    It's counterintuitive code. You're not “replacing” anything in any meaningful sense. It's just exploiting the some function for a different purpose. – Luke Maurer Jul 27 '17 at 19:43
  • @LukeMaurer is right. This is using a facility of Javascript intended for _string replacement_ to perform a _string search_. Sure, it's simpler, but did NASA slingshot the New Horizons around Pluto by taking the "simple" route, or by taking the "correct" route? We're engineers and should respect that doing it the easy way is not always right. Shying away from learning regex because it's hard is simply lazy programming. I don't mean to be derisive, rather, honest. This manifests in important ways when other developers waste half an hour trying to figure out what line three of this example does. – dudewad Sep 06 '18 at 18:25
  • 8
    @dudewad if engineers were just following the rules without thinking outside of the box, we would not even be thinking about visiting other planets right now ;-) – Christophe Sep 07 '18 at 22:56
  • Yes but this isn't creative thinking, it's lazy thinking. You can tell when there's a better solution, and this is one of those times. – dudewad Sep 13 '18 at 20:07
  • 2
    @dudewad sorry, I fail to see the lazy part here. If the exact same method was called "process" instead of "replace" you would be ok with it. I am afraid you're just stuck on the terminology. – Christophe Sep 16 '18 at 23:43
  • 4
    @Christophe I am definitely not stuck on terminology. I'm stuck on clean code. Using things that are intended for one purpose for a different purpose is called "hacky" for a reason. It creates confusing code that is difficult to understand and more often than not suffers performance-wise. The fact that you answered this question without a regex in and of itself makes it an invalid answer, since the OP is asking for how to do it with regex. I find it important, however, to hold this community to a high standard, which is why I stand by what I said above. – dudewad Sep 19 '18 at 07:54
  • @dudewad what do you mean, without a regex? – Christophe Oct 04 '18 at 02:27
  • Sry -- to clarify that, I mean by not doing it on a regex object, rather you're using an additional layer instead of using the regex itself. – dudewad Oct 04 '18 at 19:37
  • 1
    @dudewad okay. How does that make it an invalid answer? I am fine with you not agreeing, even downvoting, but let's do it in good faith. – Christophe Dec 26 '18 at 20:18
  • 1
    Some people wouldn't climb on a table to change a light bulb cuz tables are for eating on. They would only use a ladder certified for lightbulb-changing. Anything else is "hacky" and not a clean way to do it. – capr Mar 14 '23 at 22:47
61

This is a solution

var s = '[description:"aoeu" uuid:"123sth"]';

var re = /\s*([^[:]+):\"([^"]+)"/g;
var m;
while (m = re.exec(s)) {
  console.log(m[1], m[2]);
}

This is based on lawnsea's answer, but shorter.

Notice that the `g' flag must be set to move the internal pointer forward across invocations.

Jay Taylor
  • 13,185
  • 11
  • 60
  • 85
lovasoa
  • 6,419
  • 1
  • 35
  • 45
33
str.match(/regex/g)

returns all matches as an array.

If, for some mysterious reason, you need the additional information comes with exec, as an alternative to previous answers, you could do it with a recursive function instead of a loop as follows (which also looks cooler :).

function findMatches(regex, str, matches = []) {
   const res = regex.exec(str)
   res && matches.push(res) && findMatches(regex, str, matches)
   return matches
}

// Usage
const matches = findMatches(/regex/g, str)

as stated in the comments before, it's important to have g at the end of regex definition to move the pointer forward in each execution.

koders
  • 5,654
  • 1
  • 25
  • 20
24

We are finally beginning to see a built-in matchAll function, see here for the description and compatibility table. It looks like as of May 2020, Chrome, Edge, Firefox, and Node.js (12+) are supported but not IE, Safari, and Opera. Seems like it was drafted in December 2018 so give it some time to reach all browsers, but I trust it will get there.

The built-in matchAll function is nice because it returns an iterable. It also returns capturing groups for every match! So you can do things like

// get the letters before and after "o"
let matches = "stackoverflow".matchAll(/(\w)o(\w)/g);

for (match of matches) {
    console.log("letter before:" + match[1]);
    console.log("letter after:" + match[2]);
}

arrayOfAllMatches = [...matches]; // you can also turn the iterable into an array

It also seem like every match object uses the same format as match(). So each object is an array of the match and capturing groups, along with three additional properties index, input, and groups. So it looks like:

[<match>, <group1>, <group2>, ..., index: <match offset>, input: <original string>, groups: <named capture groups>]

For more information about matchAll there is also a Google developers page. There are also polyfills/shims available.

woojoo666
  • 7,801
  • 7
  • 45
  • 57
  • I really like this, but it hasn't quite landed in Firefox 66.0.3 yet. [Caniuse](https://github.com/Fyrd/caniuse/issues/4845) doesn't have a support list about it yet either. I'm looking forward to this one. I do see it working in Chromium 74.0.3729.108. – Lonnie Best May 07 '19 at 22:30
  • 1
    @LonnieBest yeah you can see the compatibility section of the [MDN page](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/matchAll) that I linked. It seems like Firefox started supporting it in version 67. Still would not recommend using it if you're trying to ship a product. There are polyfills/shims available, which I added to my answer – woojoo666 May 08 '19 at 01:55
  • Polyfill has 11m weekly downloads on [npm](https://www.npmjs.com/package/string.prototype.matchall) so it's well used! – Drenai May 27 '22 at 21:12
17

If you have ES9

(Meaning if your system: Chrome, Node.js, Firefox, etc supports Ecmascript 2019 or later)

yourString.matchAll( /your-regex/g ) // dont forget the "g"

MDN Documentation

If you use NPM

You can use the official polyfill
npm install string.prototype.matchall

const matchAll = require('string.prototype.matchall')
console.log( [...  matchAll('blah1 blah2',/blah/g)  ] )
//[
//  [ 'blah', index: 0, input: 'blah1 blah2', groups: undefined ],
//  [ 'blah', index: 6, input: 'blah1 blah2', groups: undefined ]
//]

Otherwise

Here's some functionally similar copy-paste versions

// returns an array, works on super old javascript (ES3 -- 1999)
function findAll(regexPattern, sourceString) {
    var output = []
    var match
    // auto-add global flag while keeping others as-is
    var regexPatternWithGlobal = regexPattern.global ? regexPattern : RegExp(regexPattern, regexPattern.flags+"g")
    while (match = regexPatternWithGlobal.exec(sourceString)) {
        // store the match data
        output.push(match)
        // zero-length matches will end up in an infinite loop, so increment by one char after a zero-length match is found
        if (match[0].length == 0) {
            regexPatternWithGlobal.lastIndex += 1
        }
    }
    return output
}

// this version returns an iterator, which is good for large results
// note: iterators require ES6 - 2015 standard
function* findAll(regexPattern, sourceString) {
    var match
    // auto-add global flag while keeping others as-is
    const regexPatternWithGlobal = regexPattern.global ? regexPattern : RegExp(regexPattern, regexPattern.flags+"g")
    while (match = regexPatternWithGlobal.exec(sourceString)) {
        // store the match data
        yield match
        // zero-length matches will end up in an infinite loop, so increment by one char after a zero-length match is found
        if (match[0].length == 0) {
            regexPatternWithGlobal.lastIndex += 1
        }
    }
    return output
}

example usage:

console.log(   findAll(/blah/g,'blah1 blah2')   ) 

outputs:

[ [ 'blah', index: 0 ], [ 'blah', index: 6 ] ]
Jeff Hykin
  • 1,846
  • 16
  • 25
  • With most of browsers supporting `str.matchAll` this answer should be in top list – Amit Jan 03 '22 at 15:30
  • I wouldn't recommend using that `findAll` code - there's an official [polyfill](https://www.npmjs.com/package/string.prototype.matchall) available – Drenai May 27 '22 at 21:15
11

Based on Agus's function, but I prefer return just the match values:

var bob = "&gt; bob &lt;";
function matchAll(str, regex) {
    var res = [];
    var m;
    if (regex.global) {
        while (m = regex.exec(str)) {
            res.push(m[1]);
        }
    } else {
        if (m = regex.exec(str)) {
            res.push(m[1]);
        }
    }
    return res;
}
var Amatch = matchAll(bob, /(&.*?;)/g);
console.log(Amatch);  // yeilds: [&gt;, &lt;]
bob
  • 7,539
  • 2
  • 46
  • 42
8

Iterables are nicer:

const matches = (text, pattern) => ({
  [Symbol.iterator]: function * () {
    const clone = new RegExp(pattern.source, pattern.flags);
    let match = null;
    do {
      match = clone.exec(text);
      if (match) {
        yield match;
      }
    } while (match);
  }
});

Usage in a loop:

for (const match of matches('abcdefabcdef', /ab/g)) {
  console.log(match);
}

Or if you want an array:

[ ...matches('abcdefabcdef', /ab/g) ]
sdgfsdh
  • 33,689
  • 26
  • 132
  • 245
  • 1
    Typo: `if (m)` should be `if (match)` – Botje Aug 08 '18 at 09:07
  • Arrays are already iterable, so everyone returning an array of matches are also returning iterables. What's better is if you console log an array the browser can actually print out the contents. But console logging a generic iterable just gets you [object Object] { ... } –  Oct 31 '18 at 12:47
  • All arrays are iterable but not all iterables are arrays. An iterable is superior if you don’t know what the caller will need to do. For example, if you just want the first match an iterable is more efficient. – sdgfsdh Oct 31 '18 at 13:13
  • your dream is becoming a reality, browsers are rolling out support for [a built-in `matchAll` that returns an iterable](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/matchAll) :D – woojoo666 Apr 08 '19 at 12:19
  • 1
    I've come across this answer post-matchAll implementation. I wrote some code for browser JS which supported it, but Node actually did not. This behaves identically to matchAll so I've not had to rewrite stuff - Cheers! – user37309 Apr 12 '19 at 01:13
7

Here is my function to get the matches :

function getAllMatches(regex, text) {
    if (regex.constructor !== RegExp) {
        throw new Error('not RegExp');
    }

    var res = [];
    var match = null;

    if (regex.global) {
        while (match = regex.exec(text)) {
            res.push(match);
        }
    }
    else {
        if (match = regex.exec(text)) {
            res.push(match);
        }
    }

    return res;
}

// Example:

var regex = /abc|def|ghi/g;
var res = getAllMatches(regex, 'abcdefghi');

res.forEach(function (item) {
    console.log(item[0]);
});
Agus Syahputra
  • 436
  • 4
  • 12
7

If you're able to use matchAll here's a trick:

Array.From has a 'selector' parameter so instead of ending up with an array of awkward 'match' results you can project it to what you really need:

Array.from(str.matchAll(regexp), m => m[0]);

If you have named groups eg. (/(?<firstname>[a-z][A-Z]+)/g) you could do this:

Array.from(str.matchAll(regexp), m => m.groups.firstName);
Simon_Weaver
  • 140,023
  • 84
  • 646
  • 689
3

Since ES9, there's now a simpler, better way of getting all the matches, together with information about the capture groups, and their index:

const string = 'Mice like to dice rice';
const regex = /.ice/gu;
for(const match of string.matchAll(regex)) {
    console.log(match);
}

// ["mice", index: 0, input: "mice like to dice rice", groups: undefined]

// ["dice", index: 13, input: "mice like to dice rice", groups: undefined]

// ["rice", index: 18, input: "mice like to dice rice", groups: undefined]

It is currently supported in Chrome, Firefox, Opera. Depending on when you read this, check this link to see its current support.

Community
  • 1
  • 1
iuliu.net
  • 6,666
  • 6
  • 46
  • 69
  • 1
    Superb! But it's still important to keep in mind the regex should have a flag `g` and it's `lastIndex` should be reset to 0 before the invocation of `matchAll`. – N. Kudryavtsev Nov 07 '19 at 10:54
2

Use this...

var all_matches = your_string.match(re);
console.log(all_matches)

It will return an array of all matches...That would work just fine.... But remember it won't take groups in account..It will just return the full matches...

Subham Debnath
  • 689
  • 8
  • 9
0

I would definatly recommend using the String.match() function, and creating a relevant RegEx for it. My example is with a list of strings, which is often necessary when scanning user inputs for keywords and phrases.

    // 1) Define keywords
    var keywords = ['apple', 'orange', 'banana'];

    // 2) Create regex, pass "i" for case-insensitive and "g" for global search
    regex = new RegExp("(" + keywords.join('|') + ")", "ig");
    => /(apple|orange|banana)/gi

    // 3) Match it against any string to get all matches 
    "Test string for ORANGE's or apples were mentioned".match(regex);
    => ["ORANGE", "apple"]

Hope this helps!

Sebastian Scholl
  • 855
  • 7
  • 11
0

This isn't really going to help with your more complex issue but I'm posting this anyway because it is a simple solution for people that aren't doing a global search like you are.

I've simplified the regex in the answer to be clearer (this is not a solution to your exact problem).

var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);

// We only want the group matches in the array
function purify_regex(reResult){

  // Removes the Regex specific values and clones the array to prevent mutation
  let purifiedArray = [...reResult];

  // Removes the full match value at position 0
  purifiedArray.shift();

  // Returns a pure array without mutating the original regex result
  return purifiedArray;
}

// purifiedResult= ["description", "aoeu"]

That looks more verbose than it is because of the comments, this is what it looks like without comments

var re = /^(.+?):"(.+)"$/
var regExResult = re.exec('description:"aoeu"');
var purifiedResult = purify_regex(regExResult);

function purify_regex(reResult){
  let purifiedArray = [...reResult];
  purifiedArray.shift();
  return purifiedArray;
}

Note that any groups that do not match will be listed in the array as undefined values.

This solution uses the ES6 spread operator to purify the array of regex specific values. You will need to run your code through Babel if you want IE11 support.

Daniel Tonon
  • 9,261
  • 5
  • 61
  • 64
0

Here's a one line solution without a while loop.

The order is preserved in the resulting list.

The potential downsides are

  1. It clones the regex for every match.
  2. The result is in a different form than expected solutions. You'll need to process them one more time.
let re = /\s*([^[:]+):\"([^"]+)"/g
let str = '[description:"aoeu" uuid:"123sth"]'

(str.match(re) || []).map(e => RegExp(re.source, re.flags).exec(e))

[ [ 'description:"aoeu"',
    'description',
    'aoeu',
    index: 0,
    input: 'description:"aoeu"',
    groups: undefined ],
  [ ' uuid:"123sth"',
    'uuid',
    '123sth',
    index: 0,
    input: ' uuid:"123sth"',
    groups: undefined ] ]
0

My guess is that if there would be edge cases such as extra or missing spaces, this expression with less boundaries might also be an option:

^\s*\[\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*\]\s*$

If you wish to explore/simplify/modify the expression, it's been explained on the top right panel of regex101.com. If you'd like, you can also watch in this link, how it would match against some sample inputs.


Test

const regex = /^\s*\[\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*([^\s\r\n:]+)\s*:\s*"([^"]*)"\s*\]\s*$/gm;
const str = `[description:"aoeu" uuid:"123sth"]
[description : "aoeu" uuid: "123sth"]
[ description : "aoeu" uuid: "123sth" ]
 [ description : "aoeu"   uuid : "123sth" ]
 [ description : "aoeu"uuid  : "123sth" ] `;
let m;

while ((m = regex.exec(str)) !== null) {
    // This is necessary to avoid infinite loops with zero-width matches
    if (m.index === regex.lastIndex) {
        regex.lastIndex++;
    }
    
    // The result can be accessed through the `m`-variable.
    m.forEach((match, groupIndex) => {
        console.log(`Found match, group ${groupIndex}: ${match}`);
    });
}

RegEx Circuit

jex.im visualizes regular expressions:

enter image description here

Emma
  • 27,428
  • 11
  • 44
  • 69
0

const re = /^\[(?:(.+?):"(.+?)"\s*)+\]$/g
const matches = [...re.exec('[description:"aoeu" uuid:"123sth"]').entries()]
console.log(matches)
Basically, this is ES6 way to convert Iterator returned by exec to a regular Array
Ashot
  • 399
  • 4
  • 7
-6

Here is my answer:

var str = '[me nombre es] : My name is. [Yo puedo] is the right word'; 

var reg = /\[(.*?)\]/g;

var a = str.match(reg);

a = a.toString().replace(/[\[\]]/g, "").split(','));
  • 3
    Your input string (`str`) has the wrong format (too much hard brackets). You only capture the key, not the value. Your code has syntax error and and does not execute (the last parentheses). If you answer "old" question with an already acepted answer, make sure you add more knowledge and a better answer then the already accepted one. I dont think your answer does that. – Cleared Jul 03 '17 at 06:34