9

Is there any available solution for (re-)generating PHP code from the Parser Tokens returned by token_get_all? Other solutions for generating PHP code are welcome as well, preferably with the associated lexer/parser (if any).

wen
  • 3,782
  • 9
  • 34
  • 54
  • Does anyone see a potential problem, if I simply write a large switch statement to convert tokens back to their string representations (i.e. T_DO to 'do'), map that over the tokens, join with spaces, and look for some sort of PHP code pretty-printing solution? – wen Feb 21 '11 at 16:44
  • If all you want it to do is pretty print, this will sort of work. You'll discover that regenerating floating point numbers and literal strings is more sweat than you expect. But the real question is, where did you get the token string you want to print? Presumably, you are reading some existing program, and making changes to it. In that case you'll find you need lots more machinery to parse, determine symbol tables, do flow analysis, or whatever. – Ira Baxter Mar 11 '11 at 23:06
  • Yes, I realised that rather quickly. Still, it gives me a lexer, which does, well, something... – wen Mar 12 '11 at 13:04

4 Answers4

2

In the category of "other solutions", you could try PHP Parser.

The parser turns PHP source code into an abstract syntax tree....Additionally, you can convert a syntax tree back to PHP code.

anthonygore
  • 4,722
  • 4
  • 31
  • 30
2

From my comment:

Does anyone see a potential problem, if I simply write a large switch statement to convert tokens back to their string representations (i.e. T_DO to 'do'), map that over the tokens, join with spaces, and look for some sort of PHP code pretty-printing solution?

After some looking, I found a PHP homemade solution in this question, that actually uses the PHP Tokenizer interface, as well as some PHP code formatting tools which are more configurable (but would require the solution as described above).

These could be used to quickly realize a solution. I'll post back here when I find some time to cook this up.


Solution with PHP_Beautifier

This is the quick solution I cooked up, I'll leave it here as part of the question. Note that it requires you to break open the PHP_Beautifier class, by changing everything (probably not everything, but this is easier) that is private to protected, to allow you to actually use the internal workings of PHP_Beautifier (otherwise it was impossible to reuse the functionality of PHP_Beautifier without reimplementing half their code).

An example usage of the class would be:

file: main.php

<?php
// read some PHP code (the file itself will do)
$phpCode = file_get_contents(__FILE__);

// create a new instance of PHP2PHP
$php2php = new PHP2PHP();

// tokenize the code (forwards to token_get_all)
$phpCode = $php2php->php2token($phpCode);

// print the tokens, in some way
echo join(' ', array_map(function($token) {
  return (is_array($token))
    ? ($token[0] === T_WHITESPACE)
      ? ($token[1] === "\n")
        ? "\n"
        : ''
      : token_name($token[0])
    : $token;
}, $phpCode));

// transform the tokens back into legible PHP code
$phpCode = $php2php->token2php($phpCode);
?>

As PHP2PHP extends PHP_Beautifier, it allows for the same fine-tuning under the same API that PHP_Beautifier uses. The class itself is:

file: PHP2PHP.php

class PHP2PHP extends PHP_Beautifier {

  function php2token($phpCode) {
    return token_get_all($phpCode);
  }

  function token2php(array $phpToken) {

    // prepare properties
    $this->resetProperties();
    $this->aTokens = $phpToken;
    $iTotal        = count($this->aTokens);
    $iPrevAssoc    = false;

    // send a signal to the filter, announcing the init of the processing of a file
    foreach($this->aFilters as $oFilter)
      $oFilter->preProcess();

    for ($this->iCount = 0;
         $this->iCount < $iTotal;
         $this->iCount++) {
      $aCurrentToken = $this->aTokens[$this->iCount];
      if (is_string($aCurrentToken))
        $aCurrentToken = array(
          0 => $aCurrentToken,
          1 => $aCurrentToken
        );

      // ArrayNested->off();
      $sTextLog = PHP_Beautifier_Common::wsToString($aCurrentToken[1]);

      // ArrayNested->on();
      $sTokenName = (is_numeric($aCurrentToken[0])) ? token_name($aCurrentToken[0]) : '';
      $this->oLog->log("Token:" . $sTokenName . "[" . $sTextLog . "]", PEAR_LOG_DEBUG);
      $this->controlToken($aCurrentToken);
      $iFirstOut           = count($this->aOut); //5
      $bError              = false;
      $this->aCurrentToken = $aCurrentToken;
      if ($this->bBeautify) {
        foreach($this->aFilters as $oFilter) {
          $bError = true;
          if ($oFilter->handleToken($this->aCurrentToken) !== FALSE) {
            $this->oLog->log('Filter:' . $oFilter->getName() , PEAR_LOG_DEBUG);
            $bError = false;
            break;
          }
        }
      } else {
        $this->add($aCurrentToken[1]);
      }
      $this->controlTokenPost($aCurrentToken);
      $iLastOut = count($this->aOut);
      // set the assoc
      if (($iLastOut-$iFirstOut) > 0) {
        $this->aAssocs[$this->iCount] = array(
          'offset' => $iFirstOut
        );
        if ($iPrevAssoc !== FALSE)
          $this->aAssocs[$iPrevAssoc]['length'] = $iFirstOut-$this->aAssocs[$iPrevAssoc]['offset'];
        $iPrevAssoc = $this->iCount;
      }
      if ($bError)
        throw new Exception("Can'process token: " . var_dump($aCurrentToken));
    } // ~for

    // generate the last assoc
    if (count($this->aOut) == 0)
        throw new Exception("Nothing on output!");

    $this->aAssocs[$iPrevAssoc]['length'] = (count($this->aOut) -1) - $this->aAssocs[$iPrevAssoc]['offset'];

    // post-processing
    foreach($this->aFilters as $oFilter)
      $oFilter->postProcess();
    return $this->get();
  }
}
?>
Community
  • 1
  • 1
wen
  • 3,782
  • 9
  • 34
  • 54
  • @Kirzilla: I'm not entirely sure what you are trying to accomplish, but if you want to work on a PHP AST, have you tried using NikiC's PHP-Parser? It is gives you an entire AST (not just tokens) and is maintained by a PHP core developer. See this question: http://stackoverflow.com/questions/5586358/any-decent-php-parser-written-in-php. – wen Apr 18 '13 at 12:25
1

If I'm not mistaken http://pear.php.net/package/PHP_Beautifier uses token_get_all() and then rewrites the stream. It uses heaps of methods like t_else and t_close_brace to output each token. Maybe you can hijack this for simplicity.

mario
  • 144,265
  • 20
  • 237
  • 291
  • I ended up doing this for a while, and it worked, though PHP_Beautifier is pretty hard to extend for this purpose, and forced me to break open some methods. – wen Mar 07 '11 at 21:21
-2

See our PHP Front End. It is a full PHP parser, automatically building ASTs, and a matching prettyprinter that regenerates compilable PHP code complete with the original commments. (EDIT 12/2011: See this SO answer for more details on what it takes to prettyprint from ASTs, which are just an organized version of the tokens: https://stackoverflow.com/a/5834775/120163)

The front end is built on top of our DMS Software Reengineering Toolkit, enabling the analysis and transformation of PHP ASTs (and then via the prettyprinter code).

Community
  • 1
  • 1
Ira Baxter
  • 93,541
  • 22
  • 172
  • 341
  • Is anything equivalent to the DMS toolkit available as open source? This is rather expensive for a toy project. ^^ – wen Mar 11 '11 at 21:03
  • 1
    @Pepijn: The closest things to DMS are Stratego/XT and TXL. They both have parsers, builds AST, and can regenerate code. Stratego/XT *may* have a PHP parser, but I don't know how robust it is, and that matters because PHP is truly badly documented language (DMS's PHP parser has run across millions of lines of PHP; its pretty solid). I don't think TXL has a complete PHP parser. ANTLR parses and can build ASTs with additional effort; it doesn't have any specific pretty printer machinery that I know about. ... – Ira Baxter Mar 11 '11 at 22:57
  • @Pepijn: ... You should observe that you might be able to get something that begins to approximate what DMS does, but you'll likely have to replicate the part that is missing (accurate PHP grammar? PrettyPrinter? Analysis support engines? ...). By the time you do that, you'll discover IMHO that the open source versions are more expensive than DMS at least if you think your time isn't zero cost. People will accuse me of beating my own drum here, and *I'll agree with them*. It is hard to replicate 15 years of continuous engineering let alone the 10 years of concepts on which DMS is based. – Ira Baxter Mar 11 '11 at 23:01
  • @Ira: I fully agree with you on this, and should I ever need something like this for a business project (with a clear goal in mind) I will definitely consider DMS; however, as of yet I am still a student, and my goal in mind is learning. Thus implementing this (even a half-assed version of it) will probably bring me more in experience then just buying and toying with the DMS. Still, thank you; looks like it's a great product. (You should note that although my question was PHP-related, my interest was, in fact, cross-language.) – wen Mar 12 '11 at 13:10
  • @Ira: Also, I feel that the accepted answer better reflects the original question as asked, and is thus more in line with the purposes of Stack Overflow. – wen Mar 12 '11 at 13:13
  • @Pepjin: As a student, what I suggest you do is go get TXL and play with it using the Java parser/prettyprinter they have. Parsing and prettyprinting *ARE NOT THE INTERESTING PART OF THE PROBLEM*. Rather, they are like the ante to the pot in poker; you have to do it to play, but anteing is easy, its playing the hand that is hard and leads to winning results. What you need to do, and can do, with a parsed program is the thing you want to spend energy learning about. TXL plus its Java parser will let you do that. If you want to do that with PHP, you need more mature machinery like DMS. – Ira Baxter Mar 12 '11 at 16:48
  • @Pepjin: ... and yes, I agree, if you restrict the question to the way you asked it, the answer you chose as accepted is best. – Ira Baxter Mar 12 '11 at 16:50
  • @Pepjin: For "prettyprinting" comparative purposes, you might consider comparing your solution with the prettyprinter provided as part of the example for a rather simple language ("algebra") for DMS at http://www.semdesigns.com/products/DMS/SimpleDMSDomainExample.html The example also shows the rest of the "poker hand", that is, manipulation of the parsed result in a way that gets interesting answers. – Ira Baxter Mar 12 '11 at 17:00