3

I am working on a very large PHP codebase (mostly procedural) which has been pieced together over the past 10 years by developers who came before me and we are now experiencing heavy database load. A peek into some files show upwards of 40 to 60 MySQL queries being executed. Spending the time to trace out what they are used for in this large obnoxious files often shows they query was made and the results were never used. I feel if I can identify the problem areas in the codebase, I can start to refactor.

I've seen many posts on StackOverflow about various auditing tools:

I have yet to find anything that analyzes query usage in a script though. For example, how many SELECT, INSERT, and UPDATE queries are contained in the file. And more interestingly, how many of those are contained inside of a looping structure? I wrote a quick a dirty recursive file search to identify some of the basic issues:

   $it = new RecursiveDirectoryIterator($path);
   foreach(new RecursiveIteratorIterator($it) as $file) 
   {
      if(stristr($file, "/.") == false && stristr($file, "/..") == false)
  {
         $contents = file_get_contents($file);

         // Regular MySQL queries...indicative of SQL injection points
         $mysql_count = substr_count($contents, "mysql_query(");

         $select_count = substr_count($contents, 'SELECT ');
         $insert_count = substr_count($contents, 'INSERT INTO');
         $update_count = substr_count($contents, 'UPDATE ');

         if($mysql_count > 0 || $select_count > 0 || $insert_count > 0 || $update_count > 0)
         {
            $out[] = array('file' => $file, 
                           'mysql_count' => $mysql_count,
                           'select_count' => $select_count,
                           'insert_count' => $insert_count,
                           'update_count' => $update_count
                          );
         }

         $contents = null;
    }
}

var_dump($out);

But that doesn't find anything in loops, or even analyze the query to see if maybe it contains a high threshold of JOINS. I can write a tool like this, but why reinvent the wheel if it already exists?


Are there any tools to identify PHP files which may potentially have problematic/high stress/inefficient queries?

My goal is to identify points where refactoring of legacy procedural code will be most beneficial.

Community
  • 1
  • 1
Jeremy Harris
  • 24,318
  • 13
  • 79
  • 133
  • 2
    There should be essentially one file that actually touches the database, and exposes a consistant interface for the rest of the application to consume. If you have `mysql_query...` scattered throughout many PHP files, your doing it *so* wrong. That said, no. You can't tell what code is doing without executing the code. This is an over-simplification of [the halting problem](http://en.wikipedia.org/wiki/Halting_problem). You need to either refactor your database access into one place that can gather some stats about itself, or analyze things at a lower level than your application. – user229044 Nov 09 '12 at 20:07
  • 1
    I'm quite aware of how an ideal situation would work, as well as being a proponent of PDO and prepared statements. Unfortunately, it is my job to **fix** these problems which are scattered everywhere in literally thousands of files and folders. – Jeremy Harris Nov 09 '12 at 20:09
  • Guessing this is *very* procedural with all the talk about 'files'. If it is really that procedural with, no database interface at all, suppose you could come up with something yourself leveraging reflection. What are you trying to learn? I would focus on profiling from the database side, then making the connections back to the app. – ficuscr Nov 09 '12 at 20:10
  • @ficuscr Yes, **very** procedural. I will write a tool if need be, just seeing if anyone else had already made something to address this issue. I imagine with PHP being such a popular langauge and it's philosophy changes across the years, there is a lot of legacy code businesses still use which needs to be addressed and optimized or refactored. – Jeremy Harris Nov 09 '12 at 20:13

3 Answers3

5

Worse than lots of queries are slow queries. So I would try to find those.

You could do that using the slow query log feature of MySql: http://dev.mysql.com/doc/refman/5.6/en/slow-query-log.html

Niko Sams
  • 4,304
  • 3
  • 25
  • 44
  • You could set up an integration test suite that would run a bunch of the questionable methods, and audit the slow log afterward. I have a test listener in phpunit that watches for tests taking longer than 1 second, which can indicate a query gone mad. – willoller Nov 09 '12 at 20:25
  • Yeah, I know what a slow query log is. Thanks. – Jeremy Harris Nov 09 '12 at 20:25
  • @willoller You say "questionable methods" where reality is no functions/encapsulation of code...straight procedural. PHPUnit is, dare I say **difficult** with legacy code. – Jeremy Harris Nov 09 '12 at 20:26
  • Indeed. In that case you could use a test for each file, which you might want to invest in anyway. Getting that stuff under at least basic integration tests will help you big time in the long run. Check out The Feathers Book: http://amzn.to/YZBE8W – willoller Nov 09 '12 at 20:29
  • I'm accepting this answer as it seems to be the popular vote. Thank you everyone for your suggestions. If I end up writing a useful tool to handle this, I will try to open source it on GitHub and provide a link back. – Jeremy Harris Nov 09 '12 at 21:06
2

There's probably a power-law distribution at play here where a small fraction of queries are causing the bulk of the problem. Using xdebug as @ficuscr recommends will help to identify the biggest offenders here.

Another thing you could do is alter the mysql_* functions at runtime using a plugin like APD. In your modified functions, you could make note of the current __FILE__ and __LINE__ as well as other relevant information (state of global variables, etc).

jimbo
  • 11,004
  • 6
  • 29
  • 46
1

Suppose I would suggest looking at xdebug and what it can accomplish with profiling. Results will of course be more telling with OOP code but it might be a start. With it you should be able to see resources used / number of calls to mysql_query.

The articles you linked seem to discuss this. Is this somehow not what you are looking for? Again, if your looking for SQL bottle necks specifically perhaps start with the slow query log on the database server.

ficuscr
  • 6,975
  • 2
  • 32
  • 52
  • Yeah, I've taken a look with xdebug and kcachegrind, it's not the sort of thing I'm looking for. I'm talking about code that makes no sense. And when somebody comes to add a new feature, they will make the same queries over again because they don't understand the previous stuff. Looking for metrics on number of queries in file, number of joins in querys, etc. – Jeremy Harris Nov 09 '12 at 20:24
  • I like jimbojw's idea. That should be a good boon in bridging the gap from slow query logs back to the code. I think your best bet for actual inspection of the SQL will be the database and the tools it already provides. – ficuscr Nov 09 '12 at 20:26