2

PHP has a Bison grammar file so does that mean that PHP is a completely Context-Free language?

Boy Baukema
  • 2,908
  • 1
  • 27
  • 38
  • Building a parser for PHP. Similar questions like: http://stackoverflow.com/questions/9652436 lead me to believe that SO would be the best place, but suggestions for another stack exchange are welcome! – Boy Baukema May 11 '12 at 20:47

3 Answers3

2

If you're building a parser for php, please have a look at the existing ones:

https://github.com/nikic/PHP-Parser - this is written in php and is a stand-alone php parser.

https://github.com/svalaskevicius/ionPulse/tree/master/ionParticles/ionPhp/phpParser - this one is a part of php-support plugin for ionPulse IDE, written in c++, with functional tests in <...>/ionTests/phpparsertest.h [still a work in progress]

1

Just figured I'd mention this in case you hadn't seen it, it may save you a lot of time unless this is for pure learning.

Check out the PHP Tokenizer Functions which will parse a source file into tokens for you. Then you can step over the tokens to examine the source.

This example was taken from PHP.net which reads a source file into tokens, and reproduces it with comments stripped out:

<?php
/*
* T_ML_COMMENT does not exist in PHP 5.
* The following three lines define it in order to
* preserve backwards compatibility.
*
* The next two lines define the PHP 5 only T_DOC_COMMENT,
* which we will mask as T_ML_COMMENT for PHP 4.
*/
if (!defined('T_ML_COMMENT')) {
   define('T_ML_COMMENT', T_COMMENT);
} else {
   define('T_DOC_COMMENT', T_ML_COMMENT);
}

$source = file_get_contents('example.php');
$tokens = token_get_all($source);

foreach ($tokens as $token) {
   if (is_string($token)) {
       // simple 1-character token
       echo $token;
   } else {
       // token array
       list($id, $text) = $token;

       switch ($id) { 
           case T_COMMENT: 
           case T_ML_COMMENT: // we've defined this
           case T_DOC_COMMENT: // and this
               // no action on comments
               break;

           default:
               // anything else -> output "as is"
               echo $text;
               break;
       }
   }
}
?>
drew010
  • 68,777
  • 11
  • 134
  • 162
  • Thanks! It is more a learning project and tokenizing with PHP has the drawback that you can't tokenize PHP code written for a higher (and, to a lesser degree, lesser) PHP version tokens (like parsing trait code on a CI environment that only has PHP 5.2). – Boy Baukema May 11 '12 at 20:59
-1

I think you are mixing math with interpreted parsing.

Have a look at structures and data then determine the rationale behind your question.