0

I was playing with PHP today creating a small language (just for fun), but I encountered a problem:

How can I select between matching brackets?

My template string:

for(items as item){ // this bracket
  if(some_condition){
      // do stuff

  } // my regex stops here


} // and this bracket

I used this regex [\w]+\([ \w]+\){([\s\n\r\t/\w(){}]+?)}, but it stop when finds the first closed bracket.

How can I make it select everything between his matching brackets?:

for(items as item){ // this bracket

if(some_condition){
      // do stuff

  } // my regex stops here

} // and this bracket

Then I will compile what's in the for separately.

PS: Please don't post comments like "don't bother doing this" or "don't reinvent the wheel". It is just for learning purposes.

Ionel Lupu
  • 2,695
  • 6
  • 29
  • 53

2 Answers2

1

You could try the below regex which allows another } bracket to be matched.

[\w]+\([ \w]+\){([\s\n\r\t\/\w(){}]+?}[\s\n\r\t\/\w(){}]+?)}

DEMO

Avinash Raj
  • 172,303
  • 28
  • 230
  • 274
  • this brakes if there are more instructions in my brackets.I can't do that for every instruction. – Ionel Lupu Sep 05 '14 at 09:48
  • @boyd how about this http://regex101.com/r/qW4dI6/2 ? It works for all the cases.. – Avinash Raj Sep 05 '14 at 09:58
  • I spent 1 hour to understand that lookahead thing and yes it works how I describe it in my question. Good Job. Tho how can I make it match when I have some other code at the end like `print variable; exit program;`? .It doesn't work if I add an extra space at the end – Ionel Lupu Sep 05 '14 at 11:10
  • yep. Because of line end `$` anchor. Post your input on the demo site and provide the saved link here. I will try to solve your problem... or get into the chat http://chat.stackoverflow.com/rooms/25767/regex – Avinash Raj Sep 05 '14 at 11:18
  • http://regex101.com/r/mC4iG8/2. how can I select that 'FORs' and also keep the code after that? – Ionel Lupu Sep 05 '14 at 11:26
  • did you want to match the first two for's? – Avinash Raj Sep 05 '14 at 11:27
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/60691/discussion-between-boyd-and-avinash-raj). – Ionel Lupu Sep 05 '14 at 11:30
1

You can use recursion:

$code = '
for(items as item) {
    if(some_condition) {
        while stuff {
            hi
        }
    }
    done
}
';

$re = '/{ ( ( [^{}] | (?R) ) * ) }/x';

preg_match_all($re, $code, $m);
print_r($m[1][0]);

This prints

if(some_condition) {
    while stuff {
        hi
    }
}
done

that is, the inner block has been detected correctly.

That said, regular expressions is a wrong tool for parsing formal languages (they are fine for tokenizing though). For example, the above will break hopelessly once you add a string literal containing "{":

for(items as item){
    echo "hi there :{ ";
}

What you actually need is a parser, either crafted manually (good learning exercise!) or generated (see here for options).

Community
  • 1
  • 1
georg
  • 211,518
  • 52
  • 313
  • 390
  • I think you are right. that echo is a very good example to not use regex. I'll build my own parser (I don't really like to use 'generated ' stuff from internet.I like to build my own stuff).Thank you very much for your examples and ides. – Ionel Lupu Sep 05 '14 at 09:53