3

This is an optimisation question RE: 1st principles.. Imagine I am doing a big heavy lifting comparison.. 30k files vs 30k database entries.. is it most process efficient to do one big MySQL into an array then loop through physical files checking vs the array or is it better to loop through the files and then one at a time do one line MySQL calls..

Here is some pseudo code to help explain:

 //is this faster?
 foreach($recursiveFileList as $fullpath){
     $Record = $db->queryrow("SELECT * FROM files WHERE fullpath='".$fullpath."'");
     //do some $Record logic           
 }


 //or is this faster
 $BigList = array();
 $db->query("SELECT * FROM files");
 while($Record = $db->rows()){
     $BigList[$Record['fullpath']] = $Record;
 }

 foreach($recursiveFileList as $fullpath){
     if (isset($BigList[$fullpath])){
         $Record = $BigList[$fullpath];
         //do some $Record logic
     }   
 }
user26676
  • 280
  • 3
  • 21
  • Is your SQL server different from your file server? if not, it wont ever matter tbh.MySQL is still using files to store this data on your server, it just provides you with an interface to extrapolate said data. This is mostly just a micro optimization. – Ohgodwhy May 21 '14 at 16:11
  • it's a localhost db - a LAMP server, also no files stored in MySQL - purely pathnames – user26676 May 21 '14 at 16:12
  • Then it really doesn't make much of a difference. If you wanted to, you could hook up a 2nd computer and make that your SQL server and then you could offload a lot of processing onto the SQL server, which will help keep the overhead down. But, as it stands, it's going to be a minimal performance gain. – Ohgodwhy May 21 '14 at 16:13
  • Why don't you try? http://stackoverflow.com/questions/6245971/accurate-way-to-measure-execution-times-of-php-scripts – Joren May 21 '14 at 16:26
  • What interface are you using here to connect to MySQL? You're not using placeholders which is usually a sign you're doing something very wrong and expose yourself to the risk of severe [SQL injection bugs](http://bobby-tables.com/). – tadman May 21 '14 at 17:08

1 Answers1

1

Update: if you always know that your $recursiveFileList is 100% of the table, then doing one query per row would be needless overhead. In that case, just use SELECT * FROM files.

I wouldn't use either of the two styles you show.

The first style runs one separate SQL query for each individual fullpath. This causes some overhead of SQL parsing, optimization, etc. Keep in mind that MySQL does not have the capability of remembering the query optimization from one invocation of a similar query to the next; it analysis and performs query optimization every time. The overhead is relatively small, but it adds up.

The second style shows fetching all rows from the table, and sorting it out in the application layer. This has a lot of overhead, because typically your $recursiveFileList might match only 1% or 0.1% or an even smaller portion of the rows in the table. I have seen cases where transferring excessive amounts of data over the network literally exhausted a 1Gbps network switch, and this put a ceiling on the requests per second for the application.

Use query conditions and indexes wisely to let the RDBMS examine and return only the matching rows.

The two styles you show are not the only options. What I would suggest is to use a range query to match multiple file fullpath values in a single query.

$sql = "SELECT * FROM files WHERE fullpath IN (" 
    . array_fill(0, count($recursiveFileList), "?") . ")";
$stmt = $pdo->prepare($sql);
$stmt->execute($recursiveFileList);
while ($row = $stmt->fetch()) {
    //do some $Record logic           
}

Note I also use a prepared query with ? parameter placeholders, and then pass the array of fullpath values separately when I call execute(). PDO is nice for this, because you can just pass an array, and the array elements get matched up to the parameter placeholders.

This also solves the risk of SQL injection in this case.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
  • Thanks I fully understand the role of `indexes`, but wouldn't that 3rd way (which had occurred to me) fall foul of `max_allowed_packet`. similarly, discussing injection risks is OT. – user26676 May 22 '14 at 10:38
  • also the matches are ~100% likely, it is a `$recursiveFileList` of everything that would be in the MySQL `files` table - the OP tries to make that clear - 30k vs 30k – user26676 May 22 '14 at 10:44
  • Okay, I'll add something about that above. – Bill Karwin May 22 '14 at 16:34