0

Can anyone help me with regex problem. Im making a script to go through all my .php files and get all strings passed to certain function. I need to match this cases:

/* Double quotes */
function("some string"); // Match: some string
function("some \"string\""); // Match: some "string"
function("some 'string'"); // Match: some 'string'

/* Single quotes */
function('some string'); // Match: some string
function('some \'string\''); // Match: some 'string'
function('some "string"'); // Match: some "string"

Function can also accept parameters after string, so it also needs to match this cases:

/* Additional parameters */
function("some string", "param"); // Match: some string
function("some string", $param); // Match: some string

So essentially, param can be either a string (quoted or double quoted) or unquoted variable. I need to get string only from first parameter of function, regardless if second parameter exists or is quoted in any way...

Thanks in advance...

pajcho
  • 5
  • 2

4 Answers4

0

Here's a quick sketch that might help get you started:

while (readline) {
    my ($matched) = m{
        \b function \s* \( \s*
        (
            " (?: [^"\\] | \\ .)* "
        |
            ' (?: [^'\\] | \\ .)* '
        )
    }sx or next;
    my $value = php_unescape $matched; # XXX: write me
    print $value, "\n";
}
melpomene
  • 84,125
  • 8
  • 85
  • 148
0

Your particular example was successfully processed with...

preg_match_all('#\\(\\s*("((\\\\.|[^"])+)"|\'((\\\\.|[^\'])+)\'),?#s', $test, $matches);

Here's ideone demo.

Explanation: we try to match opening parenthesis (thankfully, it's PHP; it'd be far more difficult in Ruby), followed by any number of whitespace characters, followed by...

  • either "(\\.|[^"])+"
  • or '(\\.|[^'])+'

... followed by optional comma.

Each of this sequences covers both 'special characters' (escaped with slash) and 'normal ones' (that are not the same as delimiters).

raina77ow
  • 103,633
  • 15
  • 192
  • 229
  • there is no need to double all the backslashes when using single-quoted strings. only the four backslashes for the literal one need to remain in place. `\\(\\s` can become `\(\s` – Martin Ender Dec 21 '12 at 19:56
0

Instead of using a regular expression (yourself) you could use a php parser that gives you some kind of AST, e.g. the one accepted as answer at Generate AST of a PHP source file

<?php
require 'path/to/PHP-Parser-master/lib//bootstrap.php';

class MyNodeVisitor extends PHPParser_NodeVisitorAbstract
{
    public function beforeTraverse(array $nodes) {}
    public function enterNode(PHPParser_Node $node) { }
    public function leaveNode(PHPParser_Node $node) { 
        if ($node instanceof PHPParser_Node_Expr_FuncCall) {
            if ( 'foo'===(string)$node->name ) {
                foreach( $node->args as $arg ) {
                    echo $arg->value->value, "\n";
                }
            }

        }
    }
    public function afterTraverse(array $nodes) {}
}


$parser = new PHPParser_Parser(new PHPParser_Lexer);
$nv = new MyNodeVisitor;
$traverser = new PHPParser_NodeTraverser;
$traverser->addVisitor($nv);


try {
    $stmts = $parser->parse( data() );
        $stmts = $traverser->traverse($stmts);

} catch (PHPParser_Error $e) {
    echo 'Parse Error: ', $e->getMessage();
}





function data() {
    return <<< eot
<?php   
/* Double quotes */
foo("some string"); // Match: some string
foo("some \"string\""); // Match: some "string"
foo("some 'string'"); // Match: some 'string'

/* Single quotes */
foo('some string'); // Match: some string
foo('some \'string\''); // Match: some 'string'
foo('some "string"'); // Match: some "string"   
eot;
}

prints

some string
some "string"
some 'string'
some string
some 'string'
some "string"
Community
  • 1
  • 1
VolkerK
  • 95,432
  • 20
  • 163
  • 226
0

Here I wrote a script in sed. Save it in a file file.sed

bs
:gf
s:,.*$::
s:^.::
s;.$;;
s:[\]\(["']\):\1:g
p;d
:s
/.*(\([^)]*\).*/ s::\1:
tgf
d

Next, run it sed -f file.sed FILE.py:

for x in `find -name \*.py`; do sed -f file.sed $x; done

edit:

one can replace the script with oneliner sed command, but calling it so it is much clear to debug what it does.

alinsoar
  • 15,386
  • 4
  • 57
  • 74