1

I am writing a small PHP tool to help me manage some other PHP script, and it is not intended to dynamically generate PHP without human review. I have a string which happens to be PHP script which was generated by another automated tool thus will always be consistently formed.

<?php
$scriptString = <<<'EOT'
<?php

namespace Foo;

/**
 * Foo
 */
class Foo extends Bar
{
    /**
     * @var \Doctrine\Common\Collections\Collection
     */
    private $stuff;

    /**
     * Constructor
     */
    public function __construct()
    {
        $this->stuff = new \Doctrine\Common\Collections\ArrayCollection();
    }

    /**
     * Add addBla.
     *
     * @param \Abc\Bla $bla
     *
     * @return Foo
     */
    public function addBla(\Abc\Bla $bla)
    {
        $this->bla[] = $bla;

        return $this;
    }

    /**
     * Remove bla.
     *
     * @param \Abc\Bla $bla
     *
     * @return boolean TRUE if this collection contained the specified element, FALSE otherwise.
     */
    public function removeBBa(\Abc\Bla $bla)
    {
        return $this->bla->removeElement($bla);
    }

   /**
     * Get $hello.
     *
     * @return \Bcd\Hello
     */
    public function getHello()
    {
        return $this->hello;
    }
}
EOT;

I am trying to implement the following two functions removeMethod() and selectMethod()

$methodTarget='addBla';
$methodTarget="public function $methodTarget(";

//returns a string with the method and its associated comments/annotations removed
$stringWithoutMethod=removeMethod($scriptString, $methodTarget);

//returns the target method and the method's associated comments/annotations
$stringMethod=selectMethod($scriptString, $methodTarget);

How can this be best implemented? If regex, please recommend the appropriate pattern to target either from {\n or **/ to either \n} or \n * /**

EDIT. Based on Casimir et Hippolyte's comment regarding token_get_all(), I created the following script. While it is intriguing, not sure where to go with it. Any thoughts?

<?php
$script=file_get_contents(__DIR__.'/test_token_get_sample.php');

$test1 = debug($script);
$test2 = debug($script, TOKEN_PARSE);
echo('test1 count: '.count($test1).'  test2 count: '.count($test2).PHP_EOL);
$diffs=array_diff($test1, $test2);    //empty array
echo ('differences: '.PHP_EOL.implode(PHP_EOL, $diffs));

echo(PHP_EOL.'test without TOKEN_PARSE: '.PHP_EOL.implode(PHP_EOL, $test1));

function debug($script, bool $flag=null):array
{
    $tokens = token_get_all($script, $flag);
    $output=[];
    foreach ($tokens as $token) {
        if (is_string($token)) {
            $output[] = 'simple 1-character token: '.$token;
        } else {
            list($id, $text) = $token;
            $name= token_name($id);
            $output[] = "token array: id: $id name: $name text: $text";
        }
    }
    return $output;
}
user1032531
  • 24,767
  • 68
  • 217
  • 387
  • The methods will never has a control structure in them? – user3783243 Oct 13 '19 at 13:14
  • @user3783243 "control structure"? Not sure what that means but don't think so. The initial scripts are created using https://www.doctrine-project.org/projects/doctrine-orm/en/2.6/reference/tools.html#entity-generation – user1032531 Oct 13 '19 at 13:19
  • Basically any methods using `{}`s inside, https://www.php.net/manual/en/language.control-structures.php. For the comments/annotations, those are always before the method? – user3783243 Oct 13 '19 at 13:23
  • Try Nette PHP generator. Maybe it helps... – slepic Oct 13 '19 at 13:26
  • @user3783243 Yes, some of those words will be in the string, and `return` was actually shown in my example. But they are just text and it shouldn't matter, no? An yes, the comments/annotations are always above the method. – user1032531 Oct 13 '19 at 13:28
  • 1
    The way to go is obviously not to use regex but `token_get_all` and then to build your own parser based on this tokenization for your needs. – Casimir et Hippolyte Oct 13 '19 at 13:38
  • @slepic. Thanks, expect it can be used as https://github.com/nette/php-generator/blob/master/src/PhpGenerator/ClassType.php has `removeMethod()`. Would rather not use but will if I can't find a simple solution (or will likely just use Nette for inspiration). – user1032531 Oct 13 '19 at 13:44
  • @CasimiretHippolyte Never knew that https://www.php.net/manual/en/function.token-get-all.php was available. Haven't yet tried to implement, but highly suspect that this is the "right" was of doing so. – user1032531 Oct 13 '19 at 13:45
  • @CasimiretHippolyte I edited my original post based on your comment. How should `token_get_all` actually be used to accomplish this? EDIT. Actually, I think I know. Iterate and check for ID=346, and if next ID=319 matches, do a little logic and remove it. – user1032531 Oct 13 '19 at 14:38
  • use `token_name` to obtain the constant names if you want to write something more concret than IDs in your code: see https://3v4l.org/15BHQ – Casimir et Hippolyte Oct 13 '19 at 15:09

2 Answers2

0

I used to do something similar to dynamicaly create function in a PHP files. You could split your file into arrays of string and parse each string until you find the position you are looking for. For my, I had a unique hash in the file which told me where to start adding my functions.

Once you find the place you are looking for, just loop through the lines of function you wish to add and then push them into your original lines array. Then you can join your array using the PHP_EOL char to rebuild your original file and write it down into an acutal php file.

const INSERT_AFTER_KEY = "SOMETHING NO SANE HUMAN WOULD WRITE";

$fileContent = file_get_contents(OUTPUT_FILE);
// here we are splitting the files into lines.
$fileLines = preg_split('/\r\n|\n|\r/', trim($fileContent));
// we are using a copy of the lines because we can't split into a foreach 
//https://stackoverflow.com/questions/11587894/php-will-using-array-splice-on-an-array-thats-the-subject-of-a-foreach-cause
$linesToWrite = $fileLines;
// we are parsing each line of the original file.
foreach($fileLines as $index => $line) {
    // we found the function we need to remove.
    if(strpos($line, "function $functionName") !== -1) {
        // we will count the opening bracket and remove line until we're back to 0.
        $openingBracket = 1;
        $lineToRemove = 1;
        // saving a copy of the index for later.
        $indexStart = $index
        while($openingBracket > 0) {
            if(strpos($line, "{") !== -1) {
                // we found an opening bracket, adding it to the total.
                $openingBracket++;
            }
            if(strpos($line, "}" !== -1) {
                // we found a closing bracking, removing it from the total;
                $openingBracket--;
            }
            $lineToRemove++;
        }
        // since the openingBracket is down to 0, we have the number of line the function takes, we need to remove them from the copy. We will use the copy to create our final file.
        array_splice($linesToWrite, $indexStart, $lineToRemove);
        break;
    }
}


// we write the lines to the file.
$textToWrite = join(PHP_EOL, $linesToWrite);
$outputFile = fopen(OUTPUT_FILE, 'w') or die("Could not open file " . OUTPUT_FILE);
fwrite($outputFile, $textToWrite);
fclose($outputFile);

Please take note that this code is untested and might need some more work, but this is the general idea.

Nicolas
  • 8,077
  • 4
  • 21
  • 51
  • Thanks Nicolas. This will add methods, however, my need is to remove a specific one or extract one. – user1032531 Oct 13 '19 at 13:37
  • the same concept apply, you still need to parse each line, when you find the name of the method you need, can remove lines until you find the last `}`char, – Nicolas Oct 13 '19 at 13:39
  • @user1032531 I have edited my answer to remove function instead of adding it. – Nicolas Oct 13 '19 at 13:49
  • 1
    Thanks Nicolas. Appreciate your effort, but will look first at https://www.php.net/manual/en/tokenizer.examples.php – user1032531 Oct 13 '19 at 13:54
0

Based on CasimiretHippolyte's comment...

<?php
function extractMethods(string $script, array $methods):array
{
    $methods = getMethods($script, $methods);
    $script = preg_split(PHP_EOL, $script);
    foreach ($methods as $methodName=>$lns) {
        $methods[$methodName]= implode(PHP_EOL, array_slice($script, $lns[0], $lns[1]));
    }
    return $methods;
}

function removeMethods(string $script, array $methods):string
{
    $methods = getMethods($script, $methods);
    $methods=array_values($methods);
    $script = preg_split(PHP_EOL, $script);
    for ($i=count($methods)-1; $i>=0; $i--) {
        array_splice($script, $methods[$i][0]-1, $methods[$i][1]+1);
    }
    return implode(PHP_EOL, $script);
}

function getMethods(string $script, array $methods):array
{
    $tokens = token_get_all($script);
    if(count($tokens)!==count(token_get_all($script, TOKEN_PARSE))) exit('Investigate TOKEN_PARSE');
    $count=count($tokens);
    $eof=$count - array_search('}',array_reverse($tokens));
    $output=[];
    foreach ($tokens as $index=>$token) {
        if (is_array($token) && $token[0]===T_FUNCTION) { //346
            for ($nameIndex = $index+1; $nameIndex <= $count; $nameIndex++) {
                if (is_array($tokens[$nameIndex]) && $tokens[$nameIndex][0]===T_STRING) { //319
                    if(in_array($tokens[$nameIndex][1], $methods)) {
                        for ($lastIndex = $nameIndex+1; $lastIndex <= $count; $lastIndex++) {
                            if($lastIndex===$eof || (is_array($tokens[$lastIndex]) && $tokens[$lastIndex][0]===T_DOC_COMMENT)) { //378
                                for ($endIndex = $lastIndex-1; $endIndex > 0; $endIndex--) {
                                    if (is_array($tokens[$endIndex])) {
                                        break;
                                    }
                                }
                                for ($startIndex = $index-1; $startIndex > 0; $startIndex--) {
                                    if (is_array($tokens[$startIndex]) && $tokens[$startIndex][0]===T_DOC_COMMENT) { //378
                                        $output[$tokens[$nameIndex][1]]=[$tokens[$startIndex][2], $tokens[$endIndex][2]-$tokens[$startIndex][2]];
                                        break(3);
                                    }
                                }
                                exit('Initial comment not found');
                            }
                        }
                    }
                    break;
                    exit('Next comment or closing tag not found');
                }
            }
        }
    }
    if($error = array_diff($methods, array_keys($output))){
        exit(implode(', ', $error).' not found.');
    }
    return $output;
}
user1032531
  • 24,767
  • 68
  • 217
  • 387