17

Does this exist?

I need to parse a string like:

the dog from the tree

and get something like

[[null, "the dog"], ["from", "the tree"]]

which I can do in Ruby with one RegExp and String#scan.

JavaScript's String#match cannot handle this as it just return what the RegExp is matched and not the capturing groups, so it returns something like

["the dog", "from the tree"]

Because I used String#scan many times in my Ruby application, it would be nice if there was a quick way to replicate this behavior in my JavaScript port.

EDIT: Here is the RegExp I'm using: http://pastebin.com/bncXtgYA

Jørgen R
  • 10,568
  • 7
  • 42
  • 59
itdoesntwork
  • 4,666
  • 3
  • 25
  • 38

4 Answers4

14
String.prototype.scan = function (re) {
    if (!re.global) throw "Regexp should contain /g modifier at the end"
    var s = this
    var m, r = []
    while (m = re.exec(s)) {
        m.shift()
        r.push(m)
    }
    return r
};
Daniel Garmoshka
  • 5,849
  • 39
  • 40
melpomene
  • 84,125
  • 8
  • 85
  • 148
  • I added this to my code and directly replaced `.match` to `.scan` but it still doesn't work. Here is the RegExp I'm using: http://pastebin.com/bncXtgYA (test string: `"the dog in the tree"`) I copied that RegExp directly from Ruby and I'm new to JS RegExp so there may be some problems with it. – itdoesntwork Dec 15 '12 at 20:24
  • "*doesn't work*" is meaningless. I tried `"the dog in the tree".scan(/(?:(in|into|to|at|from) )?((?:(?:the|a|an) )?(?:\d+\.|all\.)?(?:\w+|'[a-zA-Z0-9\s]*?'))/gi)` and got `[[undefined, "the dog"], ["in", "the tree"]]`. – melpomene Dec 15 '12 at 20:29
  • Ah, sorry, I'm a dum-dum and I didn't read the output correctly. Thanks! – itdoesntwork Dec 15 '12 at 20:30
7

ruby's scan() method will return nested array only when capture group is specified. http://ruby-doc.org/core-2.5.1/String.html#method-i-scan

a = "cruel world"
a.scan(/\w+/)        #=> ["cruel", "world"]
a.scan(/.../)        #=> ["cru", "el ", "wor"]
a.scan(/(...)/)      #=> [["cru"], ["el "], ["wor"]]
a.scan(/(..)(..)/)   #=> [["cr", "ue"], ["l ", "wo"]]

Below is modified version of melpomene's answer to return flat array if appropriate.

function scan(str, regexp) {
    if (!regexp.global) {
        throw new Error("RegExp without global (g) flag is not supported.");
    }
    var result = [];
    var m;
    while (m = regexp.exec(str)) {
        if (m.length >= 2) {
            result.push(m.slice(1));
        } else {
            result.push(m[0]);
        }
    }
    return result;
}
ypresto
  • 975
  • 1
  • 13
  • 23
  • I found the solution useful, but without an explanation on the original solution from @melpomeme , please can you explain why the global flag is required? – jimjamz Apr 01 '23 at 11:47
5

Here's another implementation using String.replace:

String.prototype.scan = function(regex) {
    if (!regex.global) throw "regex must have 'global' flag set";
    var r = []
    this.replace(regex, function() {
        r.push(Array.prototype.slice.call(arguments, 1, -2));
    });
    return r;
}

How it works: replace will invoke the callback on every match, passing it the matched substring, the matched groups, the offset, and the full string. We only want the matched groups, so we slice out the other arguments.

int3
  • 12,861
  • 8
  • 51
  • 80
  • I've seen this mentioned before, but never understood how something like this would be implemented. I'll try it now. Thanks! – itdoesntwork Dec 15 '12 at 20:24
  • And, by the way, here is the RegExp I'm using. Is there anything I'm doing horribly wrong? http://pastebin.com/bncXtgYA – itdoesntwork Dec 15 '12 at 20:28
  • @itdoesntwork it works for me in Chrome. What browser / browser version are you using? – int3 Dec 15 '12 at 20:31
  • Chrome, and it works. Since I directly copied this from Ruby, there might have been something wrong conventionally. Thanks again! – itdoesntwork Dec 15 '12 at 20:33
1

If you are looking for a matching of all incidents of a regexp without capture groups--which the original poster was not, but which may be desired by those landing here from Google--you want to use JavaScript's String.prototype.match along with a global flag on the RegExp:

ruby -e "p '12h 37m'.scan /\d+/"
#=> ["12", "37"]
node -e "console.log('12h 37m'.match(/\d+/g))"
// [ '12', '37' ]

If capture groups are important, then the String.prototype.matchAll method may instead be desired:

console.log(Array.from('12h 37m'.matchAll(/(\d+)(.)/g)))
// [
//   [ '12h', '12', 'h', index: 0, input: '12h 37m', groups: undefined ],
//   [ '37m', '37', 'm', index: 4, input: '12h 37m', groups: undefined ]
// ]

The results of this can be used for functional transformation by extracting the non-zero numerical entries from each result using slice():

Array.from('12h 37m'.matchAll(/(\d+)(.)/g))).flatMap( m => m.slice(1) )
// [ '12', 'h', '37', 'm' ]

Wrapping this up as a convenient extension on the prototype as others have done:

if (!String.prototype.scan) String.prototype.scan = function(re) {
    if (!re.global) {
        throw new Error("String#scan requires RegExp with global (g) flag")
    }
    return Array.from(this.matchAll(re))
                .flatMap(m => m.length > 1 ? m.slice(1) : m[0])
}

const str = '12h 37m'

console.log(str.scan(/\d+/g))
// [ '12', '37' ]

console.log(str.scan(/(\d+)(.)/g))
// [ '12', 'h', '37', 'm' ]
Phrogz
  • 296,393
  • 112
  • 651
  • 745