2

I have a regex for finding all function definitions. What I want to do now is to get also the contents in the functions e.g. as third field in $matches is that possible using regex or do I need some push-pop machine because of the nesting of {} brackets? What I want to do is a script which analyzes php code and figures out which functions have dependencies. If there is already a script let me know it!

$content = file_get_contents($fileName);
preg_match_all("/(function )(\w+\(.*?\))/", $content, $matches);

I don't want to use php-tokenizer because it figures out also some "hidden-functions" like predefined functions and that stuff, but I want just the functions written in code.

Anirudha
  • 32,393
  • 7
  • 68
  • 89
Karl Adler
  • 15,780
  • 10
  • 70
  • 88
  • 3
    A regex won't get you very far at all, you need a proper *parser*. Like [NikiC's PHP Parser](https://github.com/nikic/PHP-Parser). – deceze May 28 '13 at 06:37
  • I agree with @deceze it'd be better to use parser. With Regex it's hard. – Robert May 28 '13 at 06:43
  • isn't it a bit too much? I just want to analyze simple functions and create a graph of there's dependencies using graphviz – Karl Adler May 28 '13 at 06:46
  • 1
    Nope, it's not overkill, it's the right tool for the job. PHP is not a *regular language*, therefore *regular expressions* are the *wrong* tool for the job. Come on, you don't even know where to start using regexen, right? :-3 – deceze May 28 '13 at 06:59
  • 1
    Maybe this would be helpful?? http://forums.phpfreaks.com/topic/225722-regex-to-extract-php-functions-data/ – Anatoliy Gusarov May 28 '13 at 07:09
  • Also trying Reflection can help, too. See [ReflectionFunction](http://php.net/manual/en/reflectionfunction.construct.php) – Carlos Campderrós May 28 '13 at 07:11

1 Answers1

2

Even if for better or worse you're not Noam Chomsky, you should understand this:

PHP is not a regular language, so cannot be expressed or parsed by regular expressions.

To be a regular language, a language needs to be, among other things, context free.

language hierarchies

"Context free" means that a "word" in the language means the same thing regardless of where it occurs. This is not the case for PHP. In fact, even your simple snippet to find function signatures already crashes and burns here:

// function foo()

The context of a comment voids this function keyword of its usual meaning. Not to mention:

'function foo()';
<<<HERE
    function foo()
HERE;

and a host of similar examples. The function keyword (and everything else too) is dependent on context, making PHP a context-sensitive language, thereby not regular, thereby not feasibly parseable by regular expressions.

Use a parser.

deceze
  • 510,633
  • 85
  • 743
  • 889
  • true thoughts... sometimes I should use things learned in theoretical informatics ;) – Karl Adler May 28 '13 at 08:39
  • While it is true that regex is not the right tool here, it's worth noting that PCRE regex is able to match words in specific context (via lookarounds and other zero-width assertions, and also capturing alternation groups), and also supports recursion and subroutine calls. PHP regular expressions are not "regular" in Chomsky's terms. – Wiktor Stribiżew Dec 13 '16 at 13:37