I am working on a very large PHP codebase (mostly procedural) which has been pieced together over the past 10 years by developers who came before me and we are now experiencing heavy database load. A peek into some files show upwards of 40 to 60 MySQL queries being executed. Spending the time to trace out what they are used for in this large obnoxious files often shows they query was made and the results were never used. I feel if I can identify the problem areas in the codebase, I can start to refactor.
I've seen many posts on StackOverflow about various auditing tools:
- https://stackoverflow.com/questions/4273244/auditing-a-php-codebase?lq=1
- Is there a static code analyzer [like Lint] for PHP files?
I have yet to find anything that analyzes query usage in a script though. For example, how many SELECT, INSERT, and UPDATE queries are contained in the file. And more interestingly, how many of those are contained inside of a looping structure? I wrote a quick a dirty recursive file search to identify some of the basic issues:
$it = new RecursiveDirectoryIterator($path);
foreach(new RecursiveIteratorIterator($it) as $file)
{
if(stristr($file, "/.") == false && stristr($file, "/..") == false)
{
$contents = file_get_contents($file);
// Regular MySQL queries...indicative of SQL injection points
$mysql_count = substr_count($contents, "mysql_query(");
$select_count = substr_count($contents, 'SELECT ');
$insert_count = substr_count($contents, 'INSERT INTO');
$update_count = substr_count($contents, 'UPDATE ');
if($mysql_count > 0 || $select_count > 0 || $insert_count > 0 || $update_count > 0)
{
$out[] = array('file' => $file,
'mysql_count' => $mysql_count,
'select_count' => $select_count,
'insert_count' => $insert_count,
'update_count' => $update_count
);
}
$contents = null;
}
}
var_dump($out);
But that doesn't find anything in loops, or even analyze the query to see if maybe it contains a high threshold of JOINS. I can write a tool like this, but why reinvent the wheel if it already exists?
Are there any tools to identify PHP files which may potentially have problematic/high stress/inefficient queries?
My goal is to identify points where refactoring of legacy procedural code will be most beneficial.