1

My regex is not really doing what I want it to. It needs to find all matches in my text file where the line starts with:

func anyword() {

The pattern will do this, but I want it to stop on the next }. But if one or more of this appears:

SomeFunction();  

it should skip the next }.

Here is an example of the text it would scan. Each line is commented to show what it should do there:

override func something() {   // here is a start pattern
                              // still looking for an }
halloworld() { };             // bracket } found but is ignored because line also contains "()"
                              // still looking for an } 
                              // still looking for an }
}                             // found closing bracket } end of match

This is the pattern that I am currently using:

\w+\s+func\s\w+\(\)\{\s+(.*?)\s+\}
Abion47
  • 22,211
  • 4
  • 65
  • 88
  • 1
    regex is not equipped out of the box for matching arbitrary numbers of nested constructs. but .net regex augments regex just to do just that: https://msdn.microsoft.com/en-us/library/bs2twtah(v=vs.110).aspx#balancing_group_definition – Scott Weaver Jan 14 '17 at 20:36

3 Answers3

2

maybe something that utilises .NET Balancing Groups (to deal with those pesky nested, unknown levels of {}:

\bfunc\s*\w+\([^)]*\)\s*{(?:[^{}]|(?<Open>{)|(?<-Open>}))*(?(Open)(?!)\}

the cool part is (?:[^{}]|(?<Open>{)|(?<-Open>}))*(?(Open)(?!) it translates to: eat either one non {}, or eat a {, followed, eventually, by a }, the third option in the 3-way alternation (notice the minus sign, the name does not matter), this pops the stack and "closes" the group. This sub-pattern is free to match as many times as possible because of the *, but any { will have to be matched by the same number of }, or the pattern will fail via the last conditional, which checks the "Open" group, and if it holds any matches still, executes an empty lookahead guaranteed to fail the whole thing.

Community
  • 1
  • 1
Scott Weaver
  • 7,192
  • 2
  • 31
  • 43
  • You specify "^" for `func` being at the start of the string, but in the given sample data, it is *not* at the beginning of the string. – Abion47 Jan 14 '17 at 21:08
  • "line starts with" => `RegexOptions.Multiline` – Scott Weaver Jan 14 '17 at 21:12
  • Then clarification is needed, since the sample data clearly does not match that sentiment. – Abion47 Jan 14 '17 at 21:15
  • you're right, the sample data doesn't match the statement the OP made. perhaps a `\b` would be safer to allow for `override` etc...updated. – Scott Weaver Jan 14 '17 at 21:24
  • This `\bfunc\s*\w+\([^)]*\)\s*{(?:[^{}]|(?{)|(?<-Open>}))*\}` is incorrect since there is no conditional that checks the *Open* stack for a value and a failing lookahead. – Wiktor Stribiżew Jan 15 '17 at 21:56
0

A possible solution is this, you have to use single line matching for the dot to match newlines:

func\s*.*\s*{\n.*?((?!\{.*\}).)*}

Here you match the func keyword and the first opening bracket. Then, you use a negative lookahead that does not allow another opening bracket on that line followed with the closing bracket you are looking for.

You can even nest more opening/closing brackets and it will match the last closing bracket.

You can test it here

For more information, see this question

Community
  • 1
  • 1
freinn
  • 1,049
  • 5
  • 14
  • 23
0

If there's only going to be a single level of nested brackets, then this pattern should work for you:

\w[\w\s]+func[\w\s()]+({(?:[^{}]*|[\w\s();]*?{[\w\s();]*?}[\w\s();]*?)*})

See it implemented on Regex101.

Abion47
  • 22,211
  • 4
  • 65
  • 88