0

Possible Duplicates:
RegEx match open tags except XHTML self-contained tags
.NET Regex balancing groups expression - matching when not balanced

For example, if I had the input:

[quote]He said:
    [quote]I have no idea![/quote]
But I disagree![/quote]

And another quote:

[quote]Some other quote here.[/quote]

How can I effectively grab blocks of quotes using regular expressions without grabbing too much or too little? For example, if I use:

\[Quote\](.+)\[/Quote\]

This will grab too much (basically, the entire thing), whereas this:

\[Quote\](.+?)\[/Quote\]

will grab too little (it will only grab [quote]He said:[quote]I have no idea![/quote], with mismatching start/end braces).

So how can I effectively parse nested blocks of code like this using Regex?

Community
  • 1
  • 1
qJake
  • 16,821
  • 17
  • 83
  • 135
  • Theoretically speaking, nested patterns are not regular so they can't be handled by regexes. Of course most modern regex implementations can accommodate irregular patterns, but it's still painful to work with them. – NullUserException Aug 22 '11 at 17:59
  • Besides, is that BBCode I am seeing? See: http://stackoverflow.com/questions/3788959/regex-to-split-bbcode-into-pieces/3792262#3792262 – NullUserException Aug 22 '11 at 17:59
  • It's not BBcode, I just used this as a high-level example. The implementation I'm creating is custom, and doesn't look much like BBcode (though this concept is the same, hence, why I used it for simplicity's sake). – qJake Aug 22 '11 at 18:00
  • 1
    Please, please, don't. Use a parser or write one. See: http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Robert P Aug 22 '11 at 18:03
  • I'd go for a proper parser, but if you must use regex, see: http://stackoverflow.com/questions/183846/net-regex-balancing-groups-expression-matching-when-not-balanced – Bart Kiers Aug 22 '11 at 18:26
  • That sounds like it could work, but ... man is that confusing. :/ – qJake Aug 22 '11 at 18:33
  • @Bart Kiers If you were to post your comment as an answer, I could accept it, since this did end up working. ;) – qJake Aug 22 '11 at 19:25
  • @SpikeX, _"... man is that confusing. :/"_, that's why many people here are advising you _against_ such an approach: it's tricky to write, and a nightmare to maintain. – Bart Kiers Aug 22 '11 at 19:26
  • @SpikeX, nah, I won't create a true answer of my comment since I don't think it's a proper answer (just a link to another question). If you found the solution to your problem in that previous Q&A, this question might better be closed as a duplicate. – Bart Kiers Aug 22 '11 at 19:30
  • Then it should be closed, none of the answers below are any better. – qJake Aug 23 '11 at 14:08

2 Answers2

0

Regexes and nesting do not work well toghether. It's possible (but, depending on the regex dialect you're using, potentially very cumbersome) to construct a regex that matches only an innermost pair. However, if you want to match an entire quote with nested quotes inside, then regular expressions are simply not a strong enough tool. You'll need to look into context-free parser technology, or do successive replaces to rewrite the nested quotes to something else before matching the outer ones.

hmakholm left over Monica
  • 23,074
  • 3
  • 51
  • 73
0

Take a look at my xml indenter, it uses groups to match beginning tag to the last tag, and another group to get the content recursively.

titus
  • 5,512
  • 7
  • 25
  • 39