2

I'm trying to do a regex that can give me the following result:

Text:

[Add Months([Actual Date], 5)] - Another Text - [Actual Date] - [Cria ocorrencia padrao.Record.Name] - Another Text - [Add Months([Actual Date], 5, [Actual Date])] - [Add Months(Add Days(AddDays([Actual Date], 5), 7), 5, [Actual Date])] - final text

Desired output:

Match 1: [Add Months([Actual Date], 5)]
Match 2:  - Another Text - 
Match 3: [Actual Date]
Match 4:  - 
Match 5: [Cria ocorrencia padrao.Record.Name]
Match 6:  - Another Text - 
Match 7: [Add Months([Actual Date], 5, [Actual Date])]
Match 8:  - 
Match 9: [Add Months(Add Days(AddDays([Actual Date], 5), 7), 5, [Actual Date])]
Match 10: - final text

But I don't have success and I need to do this task.

I'm trying to use the nested pattern regex in .Net using this regex:

string pattern = @"(([^\[\]]*)??)|(\[(?>\[(?<DEPTH>)\](?<-DEPTH>)|.?)*(?(DEPTH)(?!))\])?([^\[\]]*)";

But is not working, someone can give me a light?

Thank you.

Prix
  • 19,417
  • 15
  • 73
  • 132
Acaz Souza
  • 8,311
  • 11
  • 54
  • 97
  • Can you post an example of the text you're trying to match on? – JDiPierro Aug 09 '13 at 18:29
  • Ahh that is the text! I thought that was just a description of it. – JDiPierro Aug 09 '13 at 18:30
  • 2
    I wouldn't be surprised to find that the extended non-regular regular expression flavor supported by .NET can do this, but I wouldn't recommend it. It'd probably take you less time (an hour or two) to just write a parser for this so that mere mortals can understand what it does, and when the data format changes you can modify the parser rather than spend another three days trying to come up with a regular expression that seems to work but can't be tested reliably. – Jim Mischel Aug 09 '13 at 18:38
  • A parser for this will be complex too and a lot of more code to fix if things changes. – Acaz Souza Aug 09 '13 at 19:01

2 Answers2

3

The usual way to use balancing groups is this:

\G
(?:
  [^\[\]]+
|
  \[
  (?>
    [^\[\]()]
  |
    (?<Depth>[(\[])
  |
    (?<-Depth>[)\]])
  )*
  (?(Depth)(?!))
  \]
)

Working demo.

See this post for a detailed explanation of how I arrived there. Note that in your case, I added the \G anchor to make sure that all matches are adjacent, and the first alternation does not accidentally pick up the contents of brackets.

Sorry that I cannot really decipher your pattern - free-spacing (or in .NET RegexOptions.IgnorePatternWhitespace) helps a lot.

If you know that your input is always correctly nested, then this is all that's needed. If your input can contain escaped brackets/parentheses or wrongly-nested brackets/parentheses, then this pattern will give you some undesired results, and you have to work a bit harder, to ensure that you always close the right bracket. (See the second line in the linked demo - the inner parentheses are closed in the wrong order, but they still match.)

What you can do is this: whenever you encouter an opening bracket, push it's corresponding closing pendant on the capture stack, instead of the opening bracket itself. You can do this with a lookahead, so that you don't mess up where you are in the string. Then, you only decrease the depth counter, if the current character matches that corresponding character (using a simple backreference).

\G
(?:
  [^\[\]]+
|
  \[
  (?>
    [^\[\]()]
  |
    [(](?=.*(?<Close>[)]))
  |
    \[(?=.*(?<Close>\]))
  |
    (?<-Close>\k<Close>)
  )*
  (?(Close)(?!))
  \]
)

Working demo

Of course, this still doesn't handle escaping.

Community
  • 1
  • 1
Martin Ender
  • 43,427
  • 11
  • 90
  • 130
  • See my own answer. What you think abou my answer and your answer? – Acaz Souza Aug 09 '13 at 18:49
  • @AcazSouza If it works and you know that your input will always be valid, that's perfectly fine. My answer just shows you how to make it safe with regard to the different kinds of brackets you have. – Martin Ender Aug 09 '13 at 18:51
  • Thank you, you help a lot, I'll test the patterns and try to find some failed result. – Acaz Souza Aug 09 '13 at 18:56
0

I find a pattern that solves my problem: \[(?>\[(?<DEPTH>)|\](?<-DEPTH>)|.?)*(?(DEPTH)(?!))\]|([^\[\]]*)

In this text: [Add Months([Actual Date], 5)] - Another Text - [Actual Date] - [Cria ocorrencia padrao.Record.Name] - Another Text - [Add Months([Actual Date], 5, [Actual Date])] - [Add Months(Add Days(AddDays([Actual Date], 5), 7), 5, [Actual Date])] - final text

Returns me:

Match 1: [Add Months([Actual Date], 5)]
Match 2:  - Another Text - 
Match 3: [Actual Date]
Match 4:  - 
Match 5: [Cria ocorrencia padrao.Record.Name]
Match 6:  - Another Text - 
Match 7: [Add Months([Actual Date], 5, [Actual Date])]
Match 8:  - 
Match 9: [Add Months(Add Days(AddDays([Actual Date], 5), 7), 5, [Actual Date])]
Match 10: - final text
Acaz Souza
  • 8,311
  • 11
  • 54
  • 97