6

I have the following rough code (full code is 146 lines, 90 of which are string parsing, can add if needed):

ini_set('memory_limit', '7G');
$db = new PDO("mysql:host=".$dbhost.";dbname=".$dbname, $dbuser, $dbpass, array(PDO::ATTR_PERSISTENT => true));
$db->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
$db_ub = new PDO("mysql:host=".$dbhost.";dbname=".$dbname, $dbuser, $dbpass, array(PDO::ATTR_PERSISTENT => true));
$db_ub->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, true);
$stmt = $db->prepare('select columns from stats where timestamp between ? and ?');
$stmt->execute(array('2020-04-25', '2020-05-25'));
while($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
      echo memory_get_usage() .PHP_EOL;
      echo $row['id'] . PHP_EOL;
      $stmt2 = $db_ub->prepare('select somedata from users limit 1');
      $stmt2->execute();
      $row2 = $stmt2->fetch(PDO::FETCH_ASSOC);
      $type = !empty($row2['somedate']) ? 5 : 4;
      $result = $db_ub->prepare('insert ignore into newtable (old, type) values (?, ?)');
      $result->execute(array($row['id'], $type));
}

during $stmt->execute(array('2020-04-25', '2020-05-25')); my memory consumption is as .34GB (using ps aux | grep 'php ' | awk '{$5=int(100 * $5/1024/1024)/100"GB";}{ print;}' to monitor consumption during select and show full processlist SQL side to verify). Once the script enters the while it jumps to +5 GB.

Testing the setattribute

var_dump($db->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false));

seems like it has taken affect:

bool(true)

but the behavior doesn't change when I switch buffered or unbuffered.

$db->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false)

and

$db->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, true)

Using echo $db->getAttribute(constant('PDO::MYSQL_ATTR_USE_BUFFERED_QUERY')); also shows the setting changes.

Moving the setting to the statement rather than connection as https://www.php.net/manual/en/ref.pdo-mysql.php suggested also didn't work.

$stmt = $db->prepare('select columns from stats where timestamp between ? and ?', array(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => false));

I've also tried moving the buffer setting to the connection with no affect:

$db = new PDO("mysql:host=".$dbhost.";dbname=".$dbname, $dbuser, $dbpass, array(PDO::ATTR_PERSISTENT => true, PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => false));

Taking out the second connection seems to allow the unbuffered query to function as intended:

ini_set('memory_limit', '1G');
$db = new PDO("mysql:host=".$dbhost.";dbname=".$dbname, $dbuser, $dbpass, array(PDO::ATTR_PERSISTENT => true, PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => false));
$db->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);
//$db_ub = new PDO("mysql:host=".$dbhost.";dbname=".$dbname, $dbuser, $dbpass, array(PDO::ATTR_PERSISTENT => true));
//$db_ub->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, true);
$stmt = $db->prepare('select columns from stats where timestamp between ? and ?');
$stmt->execute(array('2019-01-25', '2019-11-25'));
while($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
      echo memory_get_usage() .PHP_EOL;
      echo $row['id'] . PHP_EOL;
      /*
     $stmt2 = $db_ub->prepare('select somedata from users limit 1');
      $stmt2->execute();
      $row2 = $stmt2->fetch(PDO::FETCH_ASSOC);
      $type = !empty($row2['somedate']) ? 5 : 4;
      $result = $db_ub->prepare('insert ignore into newtable (old, type) values (?, ?)');
      $result->execute(array($row['id'], $type));
     */
}

This usage the memory_get_usage doesn't exceed 379999.

If I uncomment the second connection and make it unbuffered as well I receive:

Cannot execute queries while other unbuffered queries are active.  Consider using PDOStatement::fetchAll().  Alternatively, if your code is only ever going to run against mysql, you may enable query buffering by setting the PDO::MYSQL_ATTR_USE_BUFFERED_QUERY attribute.

The second connection buffered performs as initially described, large memory consumption on execution. If ini_set('memory_limit' is high it works if low it errors. Using a large memory_limit isn't a feasible solution.

Was using (Red Hat Enterprise Linux Server release 7.3 (Maipo)):

php71u-pdo.x86_64                  7.1.19-1.ius.centos7

Moved script to a newer machine (Amazon Linux release 2 (Karoo)):

php73-pdo.x86_64                   7.3.17-1.el7.ius

and have the same behavior.

user3783243
  • 5,368
  • 5
  • 22
  • 41
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/215109/discussion-on-question-by-user3783243-pdo-mysql-attr-use-buffered-query-not-taki). – Samuel Liew Jun 01 '20 at 14:52

3 Answers3

1

The PDO::ATTR_PERSISTENT value is not boolean. It identifies the connection being used, use unique values for multiple connections. In my case:

$db = new PDO("mysql:host=".$dbhost.";dbname=".$dbname, $dbuser, $dbpass, array(PDO::ATTR_PERSISTENT => 'unbuff', PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => false));
$db_ub = new PDO("mysql:host=".$dbhost.";dbname=".$dbname, $dbuser, $dbpass, array(PDO::ATTR_PERSISTENT => 'buff', PDO::MYSQL_ATTR_USE_BUFFERED_QUERY => true));
user3783243
  • 5,368
  • 5
  • 22
  • 41
  • 1
    God, what big fool I am. Didn't notice such an obvious issue. Just get rid of the persistent connection at all, You need anything but a persistent connection here. – Your Common Sense Jun 01 '20 at 15:50
0

Can't you get rid of most of the code by simply running a single query:

 INSERT IGNORE INTO newtable
     SELECT  ...,
             IF(..., 5, 4)
         FROM oldtable WHERE ...;

With that, you could get rid of the 7G memory issue.

If that turns out to be doing too much at once, then break it into chunks. See discussion here: http://mysql.rjweb.org/doc.php/deletebig#deleting_in_chunks (That talks about DELETEs, but it can be adapted to other things, such as your SELECT.)

On another topic: Why is select somedata from users limit 1 executed inside the loop? It seems to get the same data every time. Also, without an ORDER BY, you can't predict which limit 1 row you will get.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • The parsing I need to do is beyond the mysql version (5.6) I'm using's capabilites. With unbuffered query I'm down to .34Gs. The `select` was just an example so show I'm using a second connection and it is required because the unbuffered can't be used while it executes. – user3783243 Jun 01 '20 at 20:47
  • @user3783243 - That makes sense. Still, I recommend chunking. (This may require turning your one loop into a nested pair of loops.) 1000 rows won't need unbuffered, nor 7GB. (Unless the rows are huge. I would really limit the number of rows so that only a few MB are fetched at a time.) – Rick James Jun 01 '20 at 21:25
-2

You are actually making 135000000 queries instead of iterating over 135000000 objects.

Change the code to only do one query but order the elements as if they were inside your for loop.

$db = new PDO("mysql:host=".$dbhost.";dbname=".$dbname, $dbuser, $dbpass, array(PDO::ATTR_PERSISTENT => true));
$stmt = $db->prepare('SELECT * FROM stats ORDER BY id ASC');
$stmt->execute();
while ($row = $stmt->fetch(PDO::FETCH_ASSOC)) {
    // ...
}

You even don't need this if, it's logic that can be faster used by the DB itself:

if(!empty($row['id'])) {

Instead:

SELECT * FROM stats WHERE id IS NOT NULL ORDER BY id ASC

I didnt look into PDO/MySQL for some time but I was assuming unbuffered allows you to use the cursor:

$pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);

Take in account there can be only a single query active per connection. You are basically using the connection's buffer.

The better option would be to load only small chunks in a map reduce kind of way.

SELECT * FROM stats LIMIT 100, 0

use the results, then

SELECT * FROM stats LIMIT 100, 100

and so on.

Daniel W.
  • 31,164
  • 13
  • 93
  • 151
  • This hits the memory limit. These were the first two attempts with buffered and unbuffered. `buffered` requires committing 135 million rows to memory. Unbuffered can't be interacted with during process. – user3783243 May 30 '20 at 02:10
  • 2
    the point of (!empty($row['id']) is not to skip rows where id is null which is simply impossible but to skip non-existent ids – Your Common Sense May 30 '20 at 02:28
  • @YourCommonSense You are right. Regarding copying a table, I think there is no need for PHP at all. Like https://stackoverflow.com/a/35805097/1948292 – Daniel W. May 30 '20 at 02:31
  • @DanielW. I've gone further into this and it seems the buffer setting I'm using is not taking affect. I've made a few edits to the question so hopefully it now is clearer. Thanks for attempting to assist me so far. – user3783243 May 30 '20 at 04:22