26

I'm having a strange time dealing with selecting from a table with about 30,000 rows.

It seems my script is using an outrageous amount of memory for what is a simple, forward only walk over a query result.

Please note that this example is a somewhat contrived, absolute bare minimum example which bears very little resemblance to the real code and it cannot be replaced with a simple database aggregation. It is intended to illustrate the point that each row does not need to be retained on each iteration.

<?php
$pdo = new PDO('mysql:host=127.0.0.1', 'foo', 'bar', array(
    PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
));
$stmt = $pdo->prepare('SELECT * FROM round');
$stmt->execute();

function do_stuff($row) {}

$c = 0;
while ($row = $stmt->fetch()) {
    // do something with the object that doesn't involve keeping 
    // it around and can't be done in SQL
    do_stuff($row);
    $row = null;
    ++$c;
}

var_dump($c);
var_dump(memory_get_usage());
var_dump(memory_get_peak_usage());

This outputs:

int(39508)
int(43005064)
int(43018120)

I don't understand why 40 meg of memory is used when hardly any data needs to be held at any one time. I have already worked out I can reduce the memory by a factor of about 6 by replacing "SELECT *" with "SELECT home, away", however I consider even this usage to be insanely high and the table is only going to get bigger.

Is there a setting I'm missing, or is there some limitation in PDO that I should be aware of? I'm happy to get rid of PDO in favour of mysqli if it can not support this, so if that's my only option, how would I perform this using mysqli instead?

Shabbyrobe
  • 12,298
  • 15
  • 60
  • 87
  • This is exactly an use case for noSQL and MapReduce. Since map and reduce functions will be executed "near" your data. You could give a try to MongoDB or CouchBase (or Hadoop why not, it's quite easy actually). – Thomas Decaux Aug 19 '14 at 15:56

5 Answers5

62

After creating the connection, you need to set PDO::MYSQL_ATTR_USE_BUFFERED_QUERY to false:

<?php
$pdo = new PDO('mysql:host=127.0.0.1', 'foo', 'bar', array(
    PDO::ATTR_ERRMODE=>PDO::ERRMODE_EXCEPTION,
));
$pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);

// snip

var_dump(memory_get_usage());
var_dump(memory_get_peak_usage());

This outputs:

int(39508)
int(653920)
int(668136)

Regardless of the result size, the memory usage remains pretty much static.

Alix Axel
  • 151,645
  • 95
  • 393
  • 500
Shabbyrobe
  • 12,298
  • 15
  • 60
  • 87
2

Another option would be to do something like:

$i = $c = 0;
$query = 'SELECT home, away FROM round LIMIT 2048 OFFSET %u;';

while ($c += count($rows = codeThatFetches(sprintf($query, $i++ * 2048))) > 0)
{
    foreach ($rows as $row)
    {
        do_stuff($row);
    }
}
Alix Axel
  • 151,645
  • 95
  • 393
  • 500
  • Note that using multiple queries can cause serious problems if the table you're reading from can change between each query. (This can sometimes be mitigated with ascending order by an auto-increment key or creation date.) – mindplay.dk Jun 12 '17 at 07:29
  • You do not need to use `$c += count` but `$c = count` as it needs to be evaluated each loop. I cannot edit your answer as changes need to be at least 6 characters.... – Matthijs Feb 14 '18 at 09:20
1

The whole result set (all 30,000 rows) is buffered into memory before you can start looking at it.

You should be letting the database do the aggregation and only asking it for the two numbers you need.

SELECT SUM(home) AS home, SUM(away) AS away, COUNT(*) AS c FROM round
Dan Grossman
  • 51,866
  • 10
  • 112
  • 101
  • Can you please elaborate on the buffering aspect and the ways around it? I have updated the question to reflect that the example is somewhat contrived and cannot be simply replaced with in-DB aggregation as you suggest. – Shabbyrobe Aug 01 '11 at 07:12
  • 3
    Sorry, but many years of experience tell me not to take you at your word. I truly doubt you need to retrieve all 30,000 rows. You may be surprised by the expressive power of SQL. If you can provide your actual code or an explanation of *why* you believe you need all 30,000 rows perhaps we can get somewhere. – Dan Grossman Aug 01 '11 at 07:15
  • @DanGrossman I am exporting all 30,000 rows to a tab-delimited text file. – Damian Yerrick Feb 22 '16 at 19:32
  • You can do that directly from SQL -- a query can write rows into a tab-delimited text file -- http://dev.mysql.com/doc/refman/5.7/en/select-into.html, or you can select a few thousand rows at a time in a loop using a LIMIT clause on your query. – Dan Grossman Mar 08 '16 at 23:10
1

The reality of the situation is that if you fetch all rows and expect to be able to iterate over all of them in PHP, at once, they will exist in memory.

If you really don't think using SQL powered expressions and aggregation is the solution you could consider limiting/chunking your data processing. Instead of fetching all rows at once do something like:

1)  Fetch 5,000 rows
2)  Aggregate/Calculate intermediary results
3)  unset variables to free memory
4)  Back to step 1 (fetch next set of rows)

Just an idea...

Jake
  • 2,471
  • 15
  • 24
  • 2
    This is what I was doing in the first place, actually. I asked this question out of curiosity because it seems silly to me that I can't just barrel through a query with a result of arbitrary size using PDO when I'm not keeping any of the objects around. As a matter of fact, with MySQLi it is possible to do this and not have the memory use grow indefinitely, but I'd prefer to stick with PDO if possible. – Shabbyrobe Aug 04 '11 at 00:27
1

I haven't done this before in PHP, but you may consider fetching the rows using a scrollable cursor - see the fetch documentation for an example.

Instead of returning all the results of your query at once back to your PHP script, it holds the results on the server side and you use a cursor to iterate through them getting one at a time.

Whilst I have not tested this, it is bound to have other drawbacks such as utilising more server resources and most likely reduced performance due to additional communication with the server.

Altering the fetch style may also have an impact as by default the documentation indicates it will store both an associative array and well as a numerical indexed array which is bound to increase memory usage.

As others have suggested, reducing the number of results in the first place is most likely a better option if possible.

Jarod Elliott
  • 15,460
  • 3
  • 35
  • 34
  • 2
    I tried changing the call to `prepare` to `$stmt = $pdo->prepare('SELECT * FROM round', array(PDO::ATTR_CURSOR => PDO::CURSOR_FWDONLY));`, but the memory use remained the same. – Shabbyrobe Aug 04 '11 at 01:33