1

Here is the $source example

/**
 * These functions can be replaced via plugins. If plugins do not redefine these
 * functions, then these will be used instead.
 */

if ( !function_exists('wp_set_current_user') ) :
/**
 * Changes the current user by ID or name.
 *
 */
function wp_set_current_user($id, $name = '') {

Attention: some don't have the function_exists line.

For my special purpose, I'm trying to parse the docblock with regular expression.

Here is the regex

$t = preg_match_all("@(/\*\*.*?\*/\nfunction\s.*?\(.*?\))\s{@mis",$source,$m);

I expect to get:

    /**
     * Changes the current user by ID or name.
     *
     */
    function wp_set_current_user($id, $name = '') {

but instead, it returns me the whole code example.

Any help would be appreciated.


I find out some people ask me my purpose, I don't think this is important here though.

I'm using geany and I find out existing wordpress code hint isn't complete.

And the docblock parsers I found that don't parse function name and function arguments.

So I try to parse them on my own.

the code hint format of geany is

wp_set_current_user|Changes the current user by ID or name.|($id, $name = '')|

However, my point of this question is how to make regex take second "/**" as starting point? I'm sorry for my poor English that confused you all.

Alan Moore
  • 73,866
  • 12
  • 100
  • 156
user353889
  • 418
  • 1
  • 5
  • 8
  • 1
    i can't understand what is your goal where is source for regex ? – Mohammad Ahmad Nov 17 '12 at 08:41
  • Regex always match left to right. This means the first match of /** is used as a starting point. This might help you a bit: http://myregexp.com/eclipsePlugin.html – Lucas Hoepner Nov 17 '12 at 08:45
  • PHP already has a proper [tokenizer](http://php.net/token_get_all). – Gumbo Nov 17 '12 at 09:00
  • What is the purpose of using such a RegExp from php? Are you trying to find all the wordpress plugin files that match a certain criteria? You could easily find them using grep from the commandline. Dunno what you are trying to achieve… – cristobal Nov 18 '12 at 00:46
  • @cristobal I'm trying to parse all wordpress functions with their docblocks in one file. I think my purpose isn't important. what matters is my question: how to make regex take second /** as starting point. – user353889 Nov 18 '12 at 00:58
  • @user353889 well seems you got your answer below then. Ok perhaps you should look into another editor then you ask me. You have [Aptana](http://www.aptana.com/), [Eclipse](http://www.eclipse.org/projects/project.php?id=tools.pdt), [NetBeans](http://netbeans.org/features/php/) and a bunch of other [editors](http://en.wikipedia.org/wiki/List_of_PHP_editors) with far more superior php support integration. – cristobal Nov 21 '12 at 13:41

2 Answers2

1

You can parse comment out by regexp like this (check out Regex look around tutorial):

/\*\*/(?:(?:.(?!\*\*/))*)\*\*/

Then any number of white spaces can occur:

[\s]*

What keywords can function have in php? static, virtual, final, public, private, protected correct me if I'm forgetting something.

(?:(?:static|virtual|final|public|private|protected)\s+)*

Okay, now function header and braces:

function\s+(?P<name>\w\d_+)\s*\(...\)

The ... parts get's complicated because it can contain default value which can be complicated php string ($remove_characters = '\'"\n\r '), so parsing value (string, string, number, constant):

"[^"\\\\]*(?:\\\\.[^"\\\\]*)*"
\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*'
[\d.]+
\w+

Resulting to one large value regexp:

("[^"\\\\]*(?:\\\\.[^"\\\\]*)*"|\'[^\'\\\\]*(?:\\\\.[^\'\\\\]*)*'|[\d.]+|\w+)

And every function argument has a format $var or $var = data (of course any number of spaces + I'm omitting array $input = array()) so this is simplified var name matching:

\\$[\w_][\w\d_]*

Type matching:

([\w_]+\s+)?

So function arguments can be:

\s*([\w_]+\s+)?(\\$[\w_][\w\d_]*|\\$[\w_][\w\d_]*\s*=\s*<value>)

And complete regexp for function would look like:

function\s+(?P<name>\w\d_+)\s*\(\s*|<argument>((,<argument>)*)\)

I won't be testing those regexp for you, it's your job to do so at this point, my goal was to show you what you need if you want to do this really correctly (but feel free to edit my answer if you find a mistake).You may also use really simplified version (like just one regexp for function arguments eating everything).

Community
  • 1
  • 1
Vyktor
  • 20,559
  • 6
  • 64
  • 96
  • thanks for your help. here is the translation in php preg_match_all("@/\*\*(?<=(?<!\*/)).*?\*/@mis",$source,$m); – user353889 Nov 18 '12 at 01:24
0

If you want the easy dirty trick, use a lookahead assertion

(?<=if\ (\ !function_exists('wp_set_current_user')\ )\ :)

Appending this to your search should do the trick. (You might have to escape the single quotes.)

Lucas Hoepner
  • 1,437
  • 1
  • 16
  • 21