Recursive matching with regular expressions in Javascript

Question

Example string: $${a},{s$${d}$$}$$

I'd like to match $${d}$$ first and replace it some text so the string would become $${a},{sd}$$, then $${a},{sd}$$ will be matched.

Couldn't you just use two separate regular expressions? Match #1 first, replace, and then try to match # 2? — Pandincus, Dec 11 '10 at 00:10
For anyone coming here hoping to solve a recursive problem with regular expressions, something like https://pegjs.org/ may actually be more helpful. For instance, rules like `var = "$${" name "}$$"` would allow you to build a data structure that mimics the AST. At the end of the day, as simple as this is, it's truthfully a programming language, and don't be afraid to use the right tools for the job! — btown, Apr 05 '20 at 22:43

score 41 · Accepted Answer · answered Dec 11 '10 at 00:29

41

Annoyingly, Javascript does not provide the PCRE recursive parameter (?R), so it is far from easy to deal with the nested issue. It can be done however.

I won't reproduce code, but if you check out Steve Levithan's blog, he has some good articles on the subject. He should do, he is probably the leading authority on RegExp in JS. He wrote XRegExp, which replaces most of the PCRE bits that are missing, there is even a Match Recursive plugin!

answered Dec 11 '10 at 00:29

Orbling

20,413
3
53
64

1

I wouldn’t say that XRegExp replaces ‘most of the parts that are missing’, but it ***does*** help. For real regexes, though, you need full property and grapheme support. More than 80% of the web is Unicode now, and it’s a crime that you can’t cope with it in the browser. – tchrist Feb 23 '12 at 02:31
@tchrist: The English-speaking world barely uses it, so it is therefore unimportant to the people who could change it. That added on to the principle of impossibly slow change in the base level of the web makes such things still a way off. Inconvenient to say the least. – Orbling Feb 23 '12 at 14:58
2

@Orbling The English-speaking very much ***does use Unicode***, and a great deal‼ See [this analysis of one large English corpus](http://stackoverflow.com/questions/5567249/what-are-the-most-common-non-bmp-unicode-characters-in-actual-use). I’ve done others since then. Most web pages are in Unicode—you merely do not realize it. You cannot write English properly without it: no curly quotes, no £10 note, no 5¢ piece, &c&c. The web has seen **a meteoric *800% growth* in Unicode** over the past 5 years. That is fast change, not slow‼ People aren’t paying attention, but Unicode is here nonetheless. – tchrist Feb 23 '12 at 15:30
@tchrist: Yes, people do use Unicode, because they should. It is totally not needed for those examples you give, they are all in most western code spaces. Usually [IEC 8859-1](http://en.wikipedia.org/wiki/ISO/IEC_8859-1) is used on European websites, which is 8-bit extended ASCII of a sort. The cent symbol is available as 162=¢ and the pound as 163=£ (as a UK resident, the pound is also on my keyboard and has been so for a lot longer than Unicode has been present). All a matter of codepages, which most webpages still support. – Orbling Feb 23 '12 at 16:00
3

@tchrist: UTF8/16 are increasingly output as standard, because the webservers and editors are adopting it as default. Curly quotes are awful things anyhow, anathema to programmers. ;-) – Orbling Feb 23 '12 at 16:01
2

@Orbling: No, UTF-8 only, not UTF-16. Nobody does webpages in UTF-16: that's dumb. UTF-16 has all the disadvantages of both UTF-8 and UTF-32, but enjoys none of the benefits of either. UTF-16 is a sorry legacy. – tchrist Feb 23 '12 at 16:41
@tchrist: Sorry, should have been clearer - UTF8 for webpages, a number of editors still use UTF16 when in a Unicode mode. – Orbling Feb 23 '12 at 17:26

Akash Budhia · Answer 2 · 2018-05-03T05:47:18.960

4

I wrote this myself:

String.prototype.replacerec = function (pattern, what) {
    var newstr = this.replace(pattern, what);
    if (newstr == this)
        return newstr;
    return newstr.replace(pattern, what);
};

Usage:

"My text".replacerec(/pattern/g,"what");

P.S: As suggested by @lededje, when using this function in production it's good to have a limiting counter to avoid stack overflow.

edited May 03 '18 at 05:47

answered Feb 11 '13 at 06:29

Akash Budhia

446
3
11

2

I used it in a production code running for over an year. It a rare chance that a regex keeps on matching for infinite times. So no overflow! And that's a quick way to have recursive replace straight from JavaScript code. – Akash Budhia Jun 04 '13 at 07:22
1

The stack's limit is not infinity. IE6 can only handle 1130 calls. That's not 1130 regexp matches, it's total regexp matches plus whatever else you have going on. Saying this is a good enough answer is not correct because someone could be using it in an already function intensive environment, and something that shouldn't be adding to the stack could push it to overflow. so -1. – lededje Jun 04 '13 at 10:15
3

This can't recurse infinitely... there's no recursion? – Patrick Roberts May 03 '18 at 06:09
2

I believe the line# 5 {return newstr.replace(pattern, what);} is supposed to be {return newstr.replacerec(pattern, what);} to obtain recursion. (add "rec" at the end of "replace"). Agree? – Marcelo Finki Jul 19 '19 at 10:02
1

There's no need for recursion. String.prototype.replacerec = function (pattern, what) { var prev = null; while (prev !== what) { prev = what; what = this.replace(pattern, what); } return what; }; – Whatabrain Jan 18 '22 at 15:42

score 0 · Answer 3 · answered Dec 11 '10 at 01:05

0

Since you want to do this recursively, you are probably best off doing multiple matches using a loop.

Regex itself is not well suited for recursive-anything.

answered Dec 11 '10 at 01:05

BlueRaja - Danny Pflughoeft

84,206
33
197
283

1

PCRE regex *is* decently suited for recursive patterns, it's just that Javascript doesn't have that capability, unfortunately. – CertainPerformance Dec 07 '18 at 06:12

score 0 · Answer 4 · answered Dec 12 '17 at 17:11

0

var content = "your string content";
var found = true;
while (found) {
    found = false;
    content = content.replace(/regex/, () => { found = true; return "new value"; });
}

answered Dec 12 '17 at 17:11

Burak Büyükatlı

383
3
12

Although the concepts are perhaps there, there's so much that won't work, can go very wrong, and doesn't address the question asked. – Matt Fletcher Dec 12 '17 at 17:52
What can go wrong? This simple pattern can solve the problem in the question with the right regex definition. – Burak Büyükatlı Dec 12 '17 at 18:42
It has no fallback, so if it doesn't match, it will probably exceed memory allowance. Also what is "new value" and where should it come from? And you're not showing how OP's regex could actually work in this code. – Matt Fletcher Dec 12 '17 at 18:44
You are wrong. If there is no more matched value 'found' stays false and the while loop exits. "new value" is the new value for matched string. – Burak Büyükatlı Dec 12 '17 at 18:49

score 0 · Answer 5 · answered Sep 13 '20 at 10:12

you can try \$\${([^\$]*)}\$\$, the [^\$] mean do not capture if captured group contains $

var re = new RegExp(/\$\${([^\$]*)}\$\$/, 'g'),
  original = '$${a},{s$${d}$$}$$',
  result = original.replace(re, "$1");
  
console.log('original: ' + original)
console.log('result: ' + result);

score -2 · Answer 6 · answered Dec 11 '10 at 00:42

-2

In general, Regexps are not well suited for that kind of problem. It's better to use state machine.

answered Dec 11 '10 at 00:42

Vojta

23,061
5
49
46

@Pandincus: Nice, yacc for JS. :-) – Orbling Dec 11 '10 at 01:19
Aye, a basic parser would be fine for this application. – Orbling Dec 11 '10 at 21:19
Any links to examples of using a parser to recursively parse strings? I'm struggling to find anything. – GreenImp Jan 30 '18 at 00:05

Recursive matching with regular expressions in Javascript

6 Answers6

Linked

Related