2

Im trying to get a javascript regex that matches x opening braces, then x closing braces, while allowing them to be nested in-between each other.

For example, it would match: "{ a { q } }" but not "{ a { q } { }" or "{ } } { } {"

That being said, I have no idea how to do it with regexpes, or if it's even possible.

Not a Name
  • 993
  • 7
  • 18
  • 3
    This is possible in *some* languages such as modern Perl -- but not JavaScript :-) However, [regular expressions](http://en.wikipedia.org/wiki/Regular_expression) (from more of an "academic view") are those which accept regular languages and in short "can't count". –  Feb 24 '11 at 20:26
  • @pst that should be the accepted question. +1 for providing a clear difference between both terms (terms that usually get confused coders). – Oscar Mederos Feb 24 '11 at 20:55

4 Answers4

3

The short answer to this is no. Regular expressions are a non-context-free grammar, so it cannot be done with true regex. You can, however, look for specific (non-arbitrary) nesting patterns.

http://blogs.msdn.com/b/jaredpar/archive/2008/10/15/regular-expression-limitations.aspx

The recursion problem here is, at its heart, the same reason you can't correctly parse HTML with regex. Like XML, the construct you've described is a context-free grammar; note its close similarity with the first example from the Wikipedia article.

I've heard there are engines out there that extend regex to offer support for arbitrarily nested elements, but this would make them something other than true regex. Anyway, I don't know of any such libraries for JavaScript. I think what you want is some kind of string-manipulation-based parser.

Community
  • 1
  • 1
Justin Morgan - On strike
  • 30,035
  • 12
  • 80
  • 104
  • "Regular expressions [in JavaScript] are a ..." :-) –  Feb 24 '11 at 20:28
  • @pst: My understanding is that this applies to *any* "pure" regex, but I always welcome a correction. – Justin Morgan - On strike Feb 24 '11 at 20:31
  • @Justin no matter if it is a *context-free grammar*. The language {a^nb^n} is a context-free language, and can be parsed with *regexes* (of course, the regexes we usually use in some programming languages). It's very important to have in mind that the *regular expressions* we use when programming in some languages like C# **are different** from *theoretical regular expressions*, and that's why people can't just say that something can't be parsed using regex because it isn't a regular language. Coders don't necessary have to know what a *real* regular expression is (I think they must!) – Oscar Mederos Feb 24 '11 at 20:49
  • @Oscar: Well, I think your notion of *theoretical regular expressions* is what I was trying to get at with "pure" or "true" regexes. If the language is context-free and can contain recursively-nested open/close delimiters that must be balanced to each other, then my understanding is it can't be done in pure regex. I'm not saying it can't be done with engines that extend regex, but that depends on the engine, and since it's no longer technically regex in the strictest sense, I'd call it beyond the scope of the question. – Justin Morgan - On strike Feb 24 '11 at 23:09
  • @Justin Okay, got it. I agree with you, but what I was trying to explain is that sometimes people just want to solve an specific problem using RegEx without knowing what the formal definition of Regular Expression or Regular Language is. It is important to take it in count when advising other coders ;) – Oscar Mederos Feb 25 '11 at 01:24
2

That is not possible to do with real regular expression, and even with full-blown PCRE the "counting problem" that you're describing is an example of something that you just can't do.

An old textbook I had in school said, "regular expressions can't count." That's not true of modern "supercharged" regular expression implementations with the "{n,m}" qualifiers, but note that the values in curly braces there are constants.

To do that, you need a more complicated automaton. Context-free grammars can represent languages like you describe, as can parse expression grammars.

Pointy
  • 405,095
  • 59
  • 585
  • 614
2

Yes, it's probably possible with Regexes. No, it isn't possible in Javascript Regexes. Yes, it's probably possible in .NET Regexes for example (Balancing Groups http://msdn.microsoft.com/en-us/library/bs2twtah(v=vs.71).aspx ). No, I don't know how to do them. They give me migraine (and I'm not kidding here). They are quite extreme voodoo.

xanatos
  • 109,618
  • 12
  • 197
  • 280
  • I don't actually think that the regular expressions described by that documentation can do parenthesis-counting. – Pointy Feb 24 '11 at 20:27
  • @Pointy Search S.O. for "tchrist" to see what Perl regular expressions can do (not that I am suggesting they *should*) –  Feb 24 '11 at 20:30
  • @Pointy you can think whatever you want, but they can do it... You can give yourself a good migraine reading this http://www.codeproject.com/KB/recipes/RegEx_Balanced_Grouping.aspx They can even do "multilevel different" balancing groups ([]). And you can read this http://blogs.msdn.com/b/bclteam/archive/2005/03/15/396452.aspx To quote `Generally this is not possible with regular expression` and then `However in .Net this is actually possible with something called Balancing Group Definition.` – xanatos Feb 24 '11 at 20:32
  • Will admitting that I despise regular expressions make me a pariah here? They violate every principle of writing comprehensible code. Yet they still fascinate me... like voodoo. – Jamie Treworgy Feb 24 '11 at 20:34
  • @jamietre I have a love/hate relationship with RE. I only hate the lack of "debugging" of RE. If I was able to put a breakpoint in a RE everything would be easier. – xanatos Feb 24 '11 at 20:36
  • 1
    @xanatos, I love *using* regular expressions. Ones that other people wrote. Yeah, it's really the lack of debugger, it's like writing machine code in hex. Could someone please invent a regex wrapper with actual human-understandable terms and a debugger? – Jamie Treworgy Feb 24 '11 at 20:38
  • @xanatos and @jamietre: Ever use RegexBuddy? It's great for breaking down and illuminating complicated regexes. I love it. – Justin Morgan - On strike Feb 24 '11 at 20:42
  • @Justin, I just looked at the web site and the picture on the home page reminds me of debugging assembly language in 1986. Seriously, it does appear to do a lot of what we're complaining about. But I've now written linear parsers for various reason a million times after failing to make a working regex. While not elegant or efficient, are easy to understand and do the job, so it's hard for me to get excited about becoming a regex guru now. Too old :) – Jamie Treworgy Feb 24 '11 at 20:47
  • @jamietre it's not that hard. Trust me. All you have to do is learn the syntax, and see lots of examples: from begining to master. And of course, **create your own regexes**. Everytime you need to validate/parse something, try to do it by yourself using regex, and if you just can't after trying kind of hard, you can surf the web or ask here on SO. – Oscar Mederos Feb 24 '11 at 20:54
  • @Oscar While simple Regexes are quite easy to do, when you begin to use positive/negative lookahead/lookbehind the troubles begins. Yeah... In Javascript they are simple, because the Regex language of Javascript is weak. – xanatos Feb 24 '11 at 20:57
  • @Oscar - I do believe I'm capable of learning them, I guess it's just one of those things that over the years I never had enough incentive to do. That is, usually I can either find one that does what I need already, or if not, I can write some code to accomplish the same thing in not very much time. But as I said they do still fascinate me, because I hate doing something in a way that I know can be done more concisely or efficiently a different way. It just doesn't come up enough that I feel it's worth my time to figure out. But I greatly respect those who can. – Jamie Treworgy Feb 24 '11 at 21:00
2

AFAIK, uou can’t really do this with regular expressions only.

However, Javascript’s String.replace method does have a nice feature that could allow you some level of recursion. If you pass a function as the second parameter, that function will be called for each match encountered. You could then perform the same replace on that match, passing along the same function, which would be called for each match inside that match, etc.

I’m too tired right now to write up an example that fits what you’re asking for — or even if it’s actually possible, so I’ll leave it at this possible hint, and further working out as an exercise to the reader.

Martijn
  • 13,225
  • 3
  • 48
  • 58