3

I am developing a parser in PHP and I need a functional Regular Expression to validate a string that contains a function.

Just to understand, that I will consider using preg_replace_callback and create_function to execute the function and replace value in string recursively;

Example of string: 15 + func1 ("gis", 22, func (55), 87) + 95 + func2 () + 35

The regex should be able to marry all the functions func1 and join func 2. The regex (([^ ()])+([ ]?)*\(.*\))* is outputting func1 ("gis", 22, func (55), 87) + 95 + func2 () as only a function. This is wrong because the "95" is out of any of the functions. The regex must also be able to deal with functions such as roles within in func1.

Appreciate any help.

Smern
  • 18,746
  • 21
  • 72
  • 90
guinalz
  • 37
  • 7
  • Regexes are not really the correct tool for this, (standard) regexes cannot cope with arbitrary levels of nesting. – Oliver Charlesworth Jun 15 '13 at 19:46
  • * Regex (([^ ()])+\(.*\))* don't work. – guinalz Jun 15 '13 at 19:50
  • You can use a regex for tokenization, but the PCRE interface in PHP won't give you a parse tree. It's possible to validate the correct structure and possibly traverse it via _callback using a `(?R)` [recursive](http://stackoverflow.com/questions/133601/can-regular-expressions-be-used-to-match-nested-patterns) pattern though. – mario Jun 15 '13 at 20:05

2 Answers2

2

Instead of using a regular expression consider utilizing PHP's built in tokenizer:

http://www.php.net/manual/en/function.token-get-all.php

$tokens = token_get_all('<?php 15 + func1 ("gis", 22, func (55), 87) + 95 + func2 () + 35; ');

Which will return an array of parser tokens which you can use to match functions to their arguments.

leepowers
  • 37,828
  • 23
  • 98
  • 129
1

You can try this:

$subject = '15 + func1 ("gis", 22, func (55), 87) + 95 + func2 () + 35';

$pattern = <<<'LOD'
~
 #definitions:

 (?(DEFINE)(?<int>     [0-9]++        ))
 (?(DEFINE)(?<str>     "[^"]++"       ))

 (?(DEFINE)(?<f_name>  \b[a-z]\w*+\b  ))
 (?(DEFINE)(?<sep>     ,\h            ))

 #pattern:

 (?=
     (
        (?<func>\g<f_name>) \s*+
        \( 
           (?<args>
             (?> (?> \g<int> | \g<str> | (?-3) ) \g<sep>?+ )*
           )
        \) 
     )
 )
~x
LOD;

preg_match_all($pattern, $subject, $matches);

print_r($matches['func']);
print_r($matches['args']);

The idea is to use recursion to match functions inside functions and to put all the pattern inside a lookahead to capture all the overlapped args.

Note that i use for recursion (?-3) to refer of the third capturing group on the left, which is the first group of the pattern, thus you can replace it by (?1). But if you want to use this pattern as a subpattern, the relative reference can be useful.

The (?(DEFINE)..) in combination with the comment mode (x modifier) can be useful, because it's highly editable, you can add or edit data types, or other elements your parser may encounter. For example, if you want to allow strings between single quotes you can change the <str> subpattern like this:

(?(DEFINE)(?<str>     "[^"]++" | '[^']++'    ))

or like this to be more permissive (allowing escaped quotes):

(?(DEFINE)(?<str>     "(?>[^"]++|(?<=\\)")++" | '(?>[^']++|(?<=\\)')++'  ))
Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125
  • Cool, everything works EXCEPT the regex to escape the quotes, I tried to figure out the problem and got the change to "[^" \ \ \ \] * (?: \ \ \ \. [^ "\ \ \ \] *) * "| '[^' \ \ \ \] * (?: \ \ \ \. [^ '\ \ \ \] *) *' but without success. Can anyone see what the problem is. – guinalz Jun 15 '13 at 23:19
  • @user2489518: you don't have to escape any quotes when you use the nowdoc syntax. (i.e: `<<<'LOD'`) – Casimir et Hippolyte Jun 15 '13 at 23:25