PHP has a Bison grammar file so does that mean that PHP is a completely Context-Free language?
-
Building a parser for PHP. Similar questions like: http://stackoverflow.com/questions/9652436 lead me to believe that SO would be the best place, but suggestions for another stack exchange are welcome! – Boy Baukema May 11 '12 at 20:47
3 Answers
If you're building a parser for php, please have a look at the existing ones:
https://github.com/nikic/PHP-Parser - this is written in php and is a stand-alone php parser.
https://github.com/svalaskevicius/ionPulse/tree/master/ionParticles/ionPhp/phpParser - this one is a part of php-support plugin for ionPulse IDE, written in c++, with functional tests in <...>/ionTests/phpparsertest.h [still a work in progress]

- 176
- 1
- 3
Just figured I'd mention this in case you hadn't seen it, it may save you a lot of time unless this is for pure learning.
Check out the PHP Tokenizer Functions which will parse a source file into tokens for you. Then you can step over the tokens to examine the source.
This example was taken from PHP.net which reads a source file into tokens, and reproduces it with comments stripped out:
<?php
/*
* T_ML_COMMENT does not exist in PHP 5.
* The following three lines define it in order to
* preserve backwards compatibility.
*
* The next two lines define the PHP 5 only T_DOC_COMMENT,
* which we will mask as T_ML_COMMENT for PHP 4.
*/
if (!defined('T_ML_COMMENT')) {
define('T_ML_COMMENT', T_COMMENT);
} else {
define('T_DOC_COMMENT', T_ML_COMMENT);
}
$source = file_get_contents('example.php');
$tokens = token_get_all($source);
foreach ($tokens as $token) {
if (is_string($token)) {
// simple 1-character token
echo $token;
} else {
// token array
list($id, $text) = $token;
switch ($id) {
case T_COMMENT:
case T_ML_COMMENT: // we've defined this
case T_DOC_COMMENT: // and this
// no action on comments
break;
default:
// anything else -> output "as is"
echo $text;
break;
}
}
}
?>

- 68,777
- 11
- 134
- 162
-
Thanks! It is more a learning project and tokenizing with PHP has the drawback that you can't tokenize PHP code written for a higher (and, to a lesser degree, lesser) PHP version tokens (like parsing trait code on a CI environment that only has PHP 5.2). – Boy Baukema May 11 '12 at 20:59
I think you are mixing math with interpreted parsing.
Have a look at structures and data then determine the rationale behind your question.

- 9
- 1
-
"Have a look at structures and data"? Also I revised the question, hopefully this is more easily answerable? – Boy Baukema May 11 '12 at 20:52
-
I agree with Jazz's comment above. Please refine your question so that it is in line with how SO works. – infrared411 May 11 '12 at 21:28