229

Is there a way to retrieve the (starting) character positions inside a string of the results of a regex match() in Javascript?

stagas
  • 4,607
  • 3
  • 28
  • 28

12 Answers12

303

exec returns an object with a index property:

var match = /bar/.exec("foobar");
if (match) {
    console.log("match found at " + match.index);
}

And for multiple matches:

var re = /bar/g,
    str = "foobarfoobar";
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);
}
undefined
  • 2,939
  • 4
  • 23
  • 35
Gumbo
  • 643,351
  • 109
  • 780
  • 844
  • 6
    Thanks for your help! Can you tell me also how do I find the indexes of multiple matches? – stagas Feb 19 '10 at 11:10
  • @stagas: In that case you should better use `exec`. – Gumbo Feb 19 '10 at 11:13
  • `match()` does not have any `index` property. The result is an `Array`. – Onur Yıldırım Aug 20 '14 at 02:28
  • @OnurYıldırım It’s meant to be `exec` as shown in the second example. – Gumbo Aug 20 '14 at 07:04
  • 30
    Note: using the `re` as a variable, and adding the `g` modifier are both crucial! Otherwise you will get an endless loop. – oriadam Dec 24 '15 at 03:05
  • @OnurYıldırım - According to the `String.prototype.match()` docs on mozilla it does. It specifically states that the returned Array has an added `index` property that represents the zero-based match location in the string. – Jimbo Jonny Mar 29 '16 at 22:19
  • @OnurYıldırım - I'm telling you it wasn't wrong in the first place, the return of `match()` DOES have an `index` property. You do not need to use `exec` – Jimbo Jonny Mar 29 '16 at 22:27
  • @OnurYıldırım - It specifically states that the returned `Array` has the `index` property added. Here's the quote from the docs: _In addition, it has an index property, which represents the zero-based index of the match in the string._ You can find it at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/match#Description – Jimbo Jonny Mar 29 '16 at 22:33
  • @JimboJonny, you're right MDN says that. But test it. It does not have any `index` property. – Onur Yıldırım Mar 29 '16 at 22:33
  • 1
    @OnurYıldırım - here's a jsfiddle of it working...I've tested it all the way back to IE5...works great: https://jsfiddle.net/6uwn1vof/ – Jimbo Jonny Mar 29 '16 at 22:34
  • 1
    @JimboJonny, hm well I learned something new. My test case returns `undefined`. https://jsfiddle.net/6uwn1vof/2/ which is not a search-like example like yours. – Onur Yıldırım Mar 29 '16 at 22:37
  • 1
    @OnurYıldırım - Remove the `g` flag and it'll work. Since `match` is a function of the string, not the regex it cannot be stateful like `exec`, so it only treats it like `exec` (i.e. has an index property) if you're not looking for a global match...because then statefulness doesn't matter. – Jimbo Jonny Mar 29 '16 at 22:41
  • In TypeScript it has to be (example) `let match = /(&)\1+/g.exec(this.label); let indexes: number[] = []; while (match) { indexes.push(match.index); this.label = this.label.replace(/(&)\1+/, ''); match = /(&)\1+/g.exec(this.label); }` – Devid May 08 '18 at 08:28
  • Am I the only one? you are setting match inside the while loop. You didn't define it though. Thus, you set `window.match` and after the loop this variable is set to `null`, instead of `undefined`. Generally, stay away from setting variables inside loops, bad practice, and this answer is a good example why it is a bad practice. Just harder to read. – Toskan Apr 17 '20 at 23:05
  • 1
    @JimboJonny Intrigued by your comment that the `g` flag could be removed and it still work, I can report that it does not (or it does, but it creates an infinite loop) -- 0/10 would not recommend (up-to-date Chrome browser: Version 89.0.4389.90) – Steven Mar 30 '21 at 09:37
  • See my issue with matching regexes of width 0 in https://stackoverflow.com/a/67947861/1207489. Feel free to copy the lines about `re.lastIndex++` to your solution. – Claude Jun 12 '21 at 10:53
  • @Steven My comments were about the `match` function, in response to another commenter saying `match` returns an Array and therefore doesn't have an `index` property. In reality `match` does return an Array, but with an `index` property added _specifically_ when you remove the `g` flag. Just tested it today and it's as true as it was a decade ago (2 decades ago if you count that it worked all the way back to IE5). – Jimbo Jonny Jan 04 '22 at 03:58
  • 1
    @JimboJonny my mistake, I must have not noticed you weren't replying directly to the answer posted – Steven Jan 04 '22 at 22:17
  • @Steven I've fallen in the same pitfall. `match`, even when not global, will always start its search from the beginning, because of the lack of statefulness. So the logical conclusion is: If you want to receive the index of all matches, you can only manually iterate using `exec`, because only then you are stateful within the RegExp object, which becomes necessary. You cannot extract all match indices from a string-on-string search. JS at its finest. Who thought that an index property for the first hit would be enough? – Martin Braun Aug 06 '23 at 22:38
78

Here's what I came up with:

// Finds starting and ending positions of quoted text
// in double or single quotes with escape char support like \" \'
var str = "this is a \"quoted\" string as you can 'read'";

var patt = /'((?:\\.|[^'])*)'|"((?:\\.|[^"])*)"/igm;

while (match = patt.exec(str)) {
  console.log(match.index + ' ' + patt.lastIndex);
}
Ruslan López
  • 4,433
  • 2
  • 26
  • 37
stagas
  • 4,607
  • 3
  • 28
  • 28
  • 25
    `match.index + match[0].length` also works for the end position. – Beni Cherniavsky-Paskin Jun 06 '13 at 06:58
  • 1
    @BeniCherniavsky-Paskin, wouldn't the end position be `match.index + match[0].length - 1`? – David May 19 '15 at 16:56
  • 2
    @David, I meant exclusive end position, as taken e.g. by `.slice()` and `.substring()`. Inclusive end would be 1 less as you say. (Be careful that inclusive usually means index of last char inside match, unless it's an empty match where it's 1 *before* match and might be `-1` outside the string entirely for empty match at start...) – Beni Cherniavsky-Paskin May 19 '15 at 18:06
  • for `patt = /.*/` it goes infinity loop how can we restrict that? – abinas patra Apr 25 '21 at 16:44
32

In modern browsers, you can accomplish this with string.matchAll().

The benefit to this approach vs RegExp.exec() is that it does not rely on the regex being stateful, as in @Gumbo's answer.

let regexp = /bar/g;
let str = 'foobarfoobar';

let matches = [...str.matchAll(regexp)];
matches.forEach((match) => {
    console.log("match found at " + match.index);
});
brismuth
  • 36,149
  • 3
  • 34
  • 37
  • 1
    I had luck using this single-line solution based on `matchAll` ``` let regexp = /bar/g; let str = 'foobarfoobar'; let matchIndices = Array.from(str.matchAll(regexp)).map(x => x.index); console.log(matchIndices)``` – Steven Schkolne Jun 15 '22 at 21:42
  • not sure why you say this approach does not rely on the regex being stateful. I try your code without `g` flag and get error – Ooker Jul 10 '23 at 04:25
  • The "g" flag means "global search", i.e. match all occurrences in the string. It doesn't make sense to use str.matchAll() if you're not doing a global search. Hopefully that helps, but I'm not sure what you're trying to do. With my "stateful" comment, I mean that you don't have to use a "while" loop and rely on the internal state of the Regexp object, as in Gumbo's answer, which I linked. Good luck! – brismuth Jul 10 '23 at 16:39
26

From developer.mozilla.org docs on the String .match() method:

The returned Array has an extra input property, which contains the original string that was parsed. In addition, it has an index property, which represents the zero-based index of the match in the string.

When dealing with a non-global regex (i.e., no g flag on your regex), the value returned by .match() has an index property...all you have to do is access it.

var index = str.match(/regex/).index;

Here is an example showing it working as well:

var str = 'my string here';

var index = str.match(/here/).index;

console.log(index); // <- 10

I have successfully tested this all the way back to IE5.

Scott Weaver
  • 7,192
  • 2
  • 31
  • 43
Jimbo Jonny
  • 3,549
  • 1
  • 19
  • 23
10

You can use the search method of the String object. This will only work for the first match, but will otherwise do what you describe. For example:

"How are you?".search(/are/);
// 4
Jimmy
  • 35,686
  • 13
  • 80
  • 98
7

Here is a cool feature I discovered recently, I tried this on the console and it seems to work:

var text = "border-bottom-left-radius";

var newText = text.replace(/-/g,function(match, index){
    return " " + index + " ";
});

Which returned: "border 6 bottom 13 left 18 radius"

So this seems to be what you are looking for.

felipeab
  • 110
  • 1
  • 8
  • 6
    just beware that replacement functions add capture groups as well, so note that it's always the *second-to-last* entry in the replacement function `arguments` that is the position. Not "the second argument". The function arguments are "full match, group1, group2, ...., index of match, full string matched against" – Mike 'Pomax' Kamermans Feb 26 '17 at 00:00
4

I'm afraid the previous answers (based on exec) don't seem to work in case your regex matches width 0. For instance (Note: /\b/g is the regex that should find all word boundaries) :

var re = /\b/g,
    str = "hello world";
var guard = 10;
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);
    if (guard-- < 0) {
      console.error("Infinite loop detected")
      break;
    }
}

One can try to fix this by having the regex match at least 1 character, but this is far from ideal (and means you have to manually add the index at the end of the string)

var re = /\b./g,
    str = "hello world";
var guard = 10;
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);
    if (guard-- < 0) {
      console.error("Infinite loop detected")
      break;
    }
}

A better solution (which does only work on newer browsers / needs polyfills on older/IE versions) is to use String.prototype.matchAll()

var re = /\b/g,
    str = "hello world";
console.log(Array.from(str.matchAll(re)).map(match => match.index))

Explanation:

String.prototype.matchAll() expects a global regex (one with g of global flag set). It then returns an iterator. In order to loop over and map() the iterator, it has to be turned into an array (which is exactly what Array.from() does). Like the result of RegExp.prototype.exec(), the resulting elements have an .index field according to the specification.

See the String.prototype.matchAll() and the Array.from() MDN pages for browser support and polyfill options.


Edit: digging a little deeper in search for a solution supported on all browsers

The problem with RegExp.prototype.exec() is that it updates the lastIndex pointer on the regex, and next time starts searching from the previously found lastIndex.

var re = /l/g,
str = "hello world";
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)

This works great as long as the regex match actually has a width. If using a 0 width regex, this pointer does not increase, and so you get your infinite loop (note: /(?=l)/g is a lookahead for l -- it matches the 0-width string before an l. So it correctly goes to index 2 on the first call of exec(), and then stays there:

var re = /(?=l)/g,
str = "hello world";
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)
re.exec(str)
console.log(re.lastIndex)

The solution (that is less nice than matchAll(), but should work on all browsers) therefore is to manually increase the lastIndex if the match width is 0 (which may be checked in different ways)

var re = /\b/g,
    str = "hello world";
while ((match = re.exec(str)) != null) {
    console.log("match found at " + match.index);

    // alternative: if (match.index == re.lastIndex) {
    if (match[0].length == 0) {
      // we need to increase lastIndex -- this location was already matched,
      // we don't want to match it again (and get into an infinite loop)
      re.lastIndex++
    }
}
Claude
  • 8,806
  • 4
  • 41
  • 56
2

This member fn returns an array of 0-based positions, if any, of the input word inside the String object

String.prototype.matching_positions = function( _word, _case_sensitive, _whole_words, _multiline )
{
   /*besides '_word' param, others are flags (0|1)*/
   var _match_pattern = "g"+(_case_sensitive?"i":"")+(_multiline?"m":"") ;
   var _bound = _whole_words ? "\\b" : "" ;
   var _re = new RegExp( _bound+_word+_bound, _match_pattern );
   var _pos = [], _chunk, _index = 0 ;

   while( true )
   {
      _chunk = _re.exec( this ) ;
      if ( _chunk == null ) break ;
      _pos.push( _chunk['index'] ) ;
      _re.lastIndex = _chunk['index']+1 ;
   }

   return _pos ;
}

Now try

var _sentence = "What do doers want ? What do doers need ?" ;
var _word = "do" ;
console.log( _sentence.matching_positions( _word, 1, 0, 0 ) );
console.log( _sentence.matching_positions( _word, 1, 1, 0 ) );

You can also input regular expressions:

var _second = "z^2+2z-1" ;
console.log( _second.matching_positions( "[0-9]\z+", 0, 0, 0 ) );

Here one gets the position index of linear term.

Sandro Rosa
  • 507
  • 4
  • 12
2
var str = "The rain in SPAIN stays mainly in the plain";

function searchIndex(str, searchValue, isCaseSensitive) {
  var modifiers = isCaseSensitive ? 'gi' : 'g';
  var regExpValue = new RegExp(searchValue, modifiers);
  var matches = [];
  var startIndex = 0;
  var arr = str.match(regExpValue);

  [].forEach.call(arr, function(element) {
    startIndex = str.indexOf(element, startIndex);
    matches.push(startIndex++);
  });

  return matches;
}

console.log(searchIndex(str, 'ain', true));
Yaroslav
  • 31
  • 1
  • This is incorrect. `str.indexOf` here just finds the next occurrence of the text captured by the match, which is not necessarily the match. JS regex supports conditions on text outside of the capture with lookahead. For instance `searchIndex("foobarfoobaz", "foo(?=baz)", true)` should give `[6]`, not `[0]`. – rakslice Apr 14 '19 at 21:35
  • why ` [].forEach.call(arr, function(element)` why not arr.forEach or arr.map – Ankit Kumar Jul 23 '19 at 05:56
1

I had luck using this single-line solution based on matchAll (my use case needs an array of string positions)

let regexp = /bar/g;
let str = 'foobarfoobar';

let matchIndices = Array.from(str.matchAll(regexp)).map(x => x.index);

console.log(matchIndices)

output: [3, 9]

0
function trimRegex(str, regex){
    return str.substr(str.match(regex).index).split('').reverse().join('').substr(str.match(regex).index).split('').reverse().join('');
}

let test = '||ab||cd||';
trimRegex(test, /[^|]/);
console.log(test); //output: ab||cd

or

function trimChar(str, trim, req){
    let regex = new RegExp('[^'+trim+']');
    return str.substr(str.match(regex).index).split('').reverse().join('').substr(str.match(regex).index).split('').reverse().join('');
}

let test = '||ab||cd||';
trimChar(test, '|');
console.log(test); //output: ab||cd
SwiftNinjaPro
  • 787
  • 8
  • 17
-1

var str = 'my string here';

var index = str.match(/hre/).index;

alert(index); // <- 10
  • 3
    So just like in [this answer](https://stackoverflow.com/a/36296438/402037) from 4 years ago (which, unlike yours, works) – Andreas Dec 22 '20 at 13:41