4

Unfortunately I can't show you the code but I can give you an idea of what it looks like, what it does and what the problem I have...

<?php
   include(db.php);
   include(tools.php);
   $c = new GetDB(); // Connection to DB
   $t = new Tools(); // Classes to clean, prevent XSS and others
   if(isset($_POST['var'])){
       $nv = json_decode($_POST['var'])

       foreach($nv as $k) {
          $id = $t->clean($k->id);
          // ... goes on for about 10 keys
          // this might seems redundant or insufficient
          $id = $c->real_escape_string($id);
          // ... goes on for the rest of keys...

          $q = $c->query("SELECT * FROM table WHERE id = '$id'");
          $r = $q->fetch_row();
          if ($r[1] > 0) {
          // Item exist in DB then just UPDATE
             $q1 = $c->query(UPDATE TABLE1);
             $q4 = $c->query(UPDATE TABLE2);
             if ($x == 1) {
                 $q2 = $c->query(SELECT);
                 $rq = $q2->fetch_row();
                 if ($rq[0] > 0) {
                     // Item already in table just update
                     $q3 = $c->query(UPDATE TABLE3);
                 } else  {
                     // Item not in table then INSERT
                     $q3 = $c->query(INSERT TABLE3);
                 }
             }
          } else {
            // Item not in DB then Insert
             $q1 = $c->query(INSERT TABLE1);
             $q4 = $c->query(INSERT TABLE2);
             $q3 = $c->query(INSERT TABLE4);
             if($x == 1) {
                $q5 = $c->query(INSERT TABLE3);
             }
          }
       }
   }

As you can see is a very basic INSERT, UPDATE tables script, so before we release to full production we did some test to see that the script is working as it should, and the "result" where excellent...
So, we ran this code against 100 requests, everything when just fine... less than 1.7seconds for the 100 requests... but then we saw the amount of data that needed to be send/post it was a jaw drop for me... over 20K items it takes about 3 to 5min to send the post but the script always crash the "data" is an array in json

array (
   [0] => array (
             [id] => 1,
             [val2] => 1,
             [val3] => 1,
             [val4] => 1,
             [val5] => 1,
             [val6] => 1,
             [val7] => 1,
             [val8] => 1,
             [val8] => 1,
             [val9] => 1,
             [val10] => 1
         ),
   [1] => array (
             [id] => 2,
             [val2] => 2,
             [val3] => 2,
             [val4] => 2,
             [val5] => 2,
             [val6] => 2,
             [val7] => 2,
             [val8] => 2,
             [val8] => 2,
             [val9] => 2,
             [val10] => 2
         ),
//... about 10 to 20K depend on the day and time
)

but in json... any way, sending this information is not a problem, like I said it can take about 3 to 5mins the problem is the code that does the job receiving the data and do the queries... in a normal shared hosting we get a 503 error which by doing a debug it turn out to be a time out, so for our VPS we can increment the max_execution_time to whatever we need to, to process 10K+ it takes about 1hr in our VPS, but in a shared hosting we can't use max_execution_time... So I ask the other developer the one that is sending the information that instead of sending 10K+ in one blow to send a batch of 1K and let it rest for a second then send another batch..and so on ... so far I haven't got any answer... so I was thinking to do the "pause" on my end, say, after process 1K of items wait for a sec then continue but I don't see it as efficient as receiving the data in batches... how would you solve this?

Tanker
  • 1,178
  • 3
  • 16
  • 49
  • Can you receive zip files? If so, client can upload zip file containing data. You can receive and send acknowledgement. Process the zip in batch. Report errors, if any – zedfoxus Jun 18 '15 at 03:06
  • that is actually not a bad idea, they would have to zip the information, sent it to my script and I can handle the upload, unzip, read file, then when finish delete file extracted and zip... uhh, not a bad idea, thanks. – Tanker Jun 18 '15 at 03:14
  • 2
    That entire code can be rewritten in max 5 lines of code if you used the database properly. Using it properly means this: you don't check if item exists in PHP. You perform `INSERT INTO ... ON DUPLICATE KEY UPDATE`. Wrapping several hundreds or thousands inserts/updates in a transaction makes this execute quickly. Best part: you never end up with duplicates. Ever. – N.B. Jun 18 '15 at 06:03

2 Answers2

1

Sorry, I don't have enough reputation to comment everywhere, yet, so I have to write this in an answer. I would recommend zedfoxus' method of batch processing above. In addition, I would highly recommend figuring out a way of processing those queries faster. Keep in mind that every single PHP function call, etc. gets multiplied by every row of data. Here are just a couple of the ways you might be able to get better performance:

  1. Use prepared statements. This will allow MySQL to cache the memory operation for each successive query. This is really important.
  2. If you use prepared statements, then you can drop the $c->real_escape_string() calls. I would also scratch my head to see what you can safely leave out of the $t->clean() method.
  3. Next I would evaluate the performance of evaluating every single row individually. I'd have to benchmark it to be sure, but I think running a few PHP statements beforehand will be faster than making umpteen unnecessary MySQL SELECT and UPDATE calls. MySQL is much faster when inserting multiple rows at a time. If you expect multiple rows of your input to be changing the same row in the database, then you might want to consider the following:

    a. Think about creating a temporary, precompiled array (depending on memory usage involved) that stores the unique rows of data. I would also consider doing the same for the secondary TABLE3. This would eliminate needless "update" queries, and make part b possible.

    b. Consider a single query that selects every id from the database that's in the array. This will be the list of items to use an UPDATE query for. Update each of these rows, removing them from the temporary array as you go. Then, you can create a single, multi-row insert statement (prepared, of course), that does all of the inserts at a single time.

  4. Take a look at optimizing your MySQL server parameters to better handle the load.

  5. I don't know if this would speed up a prepared INSERT statement at all, but it might be worth a try. You can wrap the INSERT statement within a transaction as detailed in an answer here: MySQL multiple insert performance

I hope that helps, and if anyone else has some suggestions, just post them in the comments and I'll try to include them.

Here's a look at the original code with just a few suggestions for changes:

<?php
   /* You can make sure that the connection type is persistent and 
    * I personally prefer using the PDO driver.
    */
   include(db.php);
   /* Definitely think twice about each tool that is included.
    * Only include what you need to evaluate the submitted data.
    */
   include(tools.php);
   $c = new GetDB(); // Connection to DB
   /* Take a look at optimizing the code in the Tools class. 
    * Avoid any and all kinds of loops–this code is going to be used in 
    * a loop and could easily turn into O(n^2) performance drain. 
    * Minimize the amount of string manipulation requests. 
    * Optimize regular expressions.
    */
   $t = new Tools(); // Classes to clean, prevent XSS and others
   if(isset($_POST['var'])){ // !empty() catches more cases than isset()
       $nv = json_decode($_POST['var'])

       /* LOOP LOGIC
        * Definitely test my hypothesis yourself, but this is similar 
        * to what I would try first.
        */

       //Row in database query
       $inTableSQL = "SELECT id FROM TABLE1 WHERE id IN("; //keep adding to it

       foreach ($nv as $k) {
           /* I would personally use specific methods per data type.
            * Here, I might use a type cast, plus valid int range check.
            */
           $id = $t->cleanId($k->id); //I would include a type cast: (int)
           // Similarly for other values
           //etc.
           // Then save validated data to the array(s)
           $data[$id] = array($values...);
           /* Now would also be a good time to add the id to the SELECT
            * statement
            */
           $inTableSQL .= "$id,";
       }

       $inTableSQL .= ");";

       // Execute query here

       // Then step through the query ids returned, perform UPDATEs,
       // remove the array element once UPDATE is done (use prepared statements)

       foreach (.....

       /* Then, insert the remaining rows all at once...
        * You'll have to step through the remaining array elements to
        * prepare the statement.
        */
       foreach(.....

   } //end initial POST data if


       /* Everything below here becomes irrelevant */

       foreach($nv as $k) {
          $id = $t->clean($k->id);
          // ... goes on for about 10 keys
          // this might seems redundant or insufficient
          $id = $c->real_escape_string($id);
          // ... goes on for the rest of keys...

          $q = $c->query("SELECT * FROM table WHERE id = '$id'");
          $r = $q->fetch_row();
          if ($r[1] > 0) {
          // Item exist in DB then just UPDATE
             $q1 = $c->query(UPDATE TABLE1);
             $q4 = $c->query(UPDATE TABLE2);
             if ($x == 1) {
                 $q2 = $c->query(SELECT);
                 $rq = $q2->fetch_row();
                 if ($rq[0] > 0) {
                     // Item already in table just update
                     $q3 = $c->query(UPDATE TABLE3);
                 } else  {
                     // Item not in table then INSERT
                     $q3 = $c->query(INSERT TABLE3);
                 }
             }
          } else {
            // Item not in DB then Insert
             $q1 = $c->query(INSERT TABLE1);
             $q4 = $c->query(INSERT TABLE2);
             $q3 = $c->query(INSERT TABLE4);
             if($x == 1) {
                $q5 = $c->query(INSERT TABLE3);
             }
          }
       }
   }
Community
  • 1
  • 1
Ben T
  • 202
  • 1
  • 4
  • That is a nice approach I'll give it a go and see what happen, about my tool.php I have some "basic" functions several actually, to sanitize values, like strings, ints, remove html tags, allow a few html, get only numbers and such..is just a file that use for all my projects but you are correct is just way to much, I'll exclude that file and go just like htmlentities() or filter_var().. since the information is coming from a known source I can even go plain, I can just check from where the post is coming like if post/request is from IP 1 do it, other than that ignore it... thanks – Tanker Jun 18 '15 at 07:27
  • Thanks @Tanker! Feel free to reach out with any more questions. Up-votes and a selected answer are also much appreciated. (I'm new here!) – Ben T Jun 18 '15 at 07:32
0

The key is to minimize queries. Often, where you are looping over data doing one or more queries per iteration, you can replace it with a constant number of queries. In your case, you'll want to rewrite it into something like this:

include(db.php);
include(tools.php);
$c = new GetDB(); // Connection to DB
$t = new Tools(); // Classes to clean, prevent XSS and others
if(isset($_POST['var'])){
    $nv = json_decode($_POST['var'])

    $table1_data = array();
    $table2_data = array();
    $table3_data = array();
    $table4_data = array();

    foreach($nv as $k) {
        $id = $t->clean($k->id);
        // ... goes on for about 10 keys
        // this might seems redundant or insufficient
        $id = $c->real_escape_string($id);
        // ... goes on for the rest of keys...

        $table1_data[] = array( ... );
        $table2_data[] = array( ... );
        $table4_data[] = array( ... );
        if ($x == 1) {
            $table3_data[] = array( ... );
        }
    }

    $values = array_to_sql($table1_data);
    $c->query("INSERT INTO TABLE1 (...) VALUES $values ON DUPLICATE KEY UPDATE ...");
    $values = array_to_sql($table2_data);
    $c->query("INSERT INTO TABLE2 (...) VALUES $values ON DUPLICATE KEY UPDATE ...");
    $values = array_to_sql($table3_data);
    $c->query("INSERT INTO TABLE3 (...) VALUES $values ON DUPLICATE KEY UPDATE ...");
    $values = array_to_sql($table4_data);
    $c->query("INSERT IGNORE INTO TABLE4 (...) VALUES $values");
}

While your original code executed between 3 and 5 queries per row of your input data, the above code only executes 4 queries in total.

I leave the implementation of array_to_sql to the reader, but hopefully this should explain the idea. TABLE4 is an INSERT IGNORE here since you didn't have an UPDATE in the "found" clause of your original loop.

Keiji
  • 880
  • 5
  • 12
  • The thing is that I do need to "duplicate" the information for "Table 4", I know is weird, is a multi-language system, and the way the develop it, is that language 1(ES), language 2(EN) is begin save in the same table, say, product 1 has ES and EN and the ID is 2, so, in table 4 I have to "duplicate" that item one row for ES and one row for EN and the column "ID" will hold the ID 2 even though that column is been set to auto-increment. they have 10K+ products at all times, and they use 6 languages, so that table has 10K ES, 10K EN, 10K JP.etc each product * language, only 1 column change... – Tanker Jun 18 '15 at 07:39
  • I'm not entirely sure what you're trying to do, but looking at this, your original code will insert at most one row in TABLE4 for each corresponding row that does not exist in TABLE1. My code will insert the same number or more rows into TABLE4 since it doesn't check for non-existence in TABLE1. If there was a row that would not be inserted by my code due to the IGNORE, then it would *error* in your code (duplicate key). You seem to be saying my code would insert too *few* rows, but I can only see a possible issue with it inserting too *many* rows. – Keiji Jun 18 '15 at 17:55