One option would be to remove all <script>...</script>
tags before processing, but this assumes that you will only have JavaScript in these tags directly in the file. If you have a function or a library that generates HTML for you, it is possible for you to output JavaScript code without explicitly having the <script>...</script>
tags in your PHP document. The issue is that you are using pattern matching, which can lead to an array of false positives.
To remove these false positives all together, you could use the PHP ReflectionFunction class to determine which functions are defined in PHP and which are not. Once you have an array of possible function names, use the following:
$data = file_get_contents($file);
$outputData;
$validMatches=array();
preg_match_all("/function[\s\n]+(\S+)[\s\n]*\(/", $data, $outputData);
foreach($outputData[1] as $match) {
$isValid=true;
try {
$reflection = new \ReflectionFunction($match);
} catch (\ReflectionException $e) {
$isValid=false;
}
if($isValid == true) {
$validMatches[]=$match;
}
}
This is more verbose but it will guarantee that you will get a list of only PHP function names.