84

Just a quick question.

Is there any performance difference between using PDO::fetchAll() and PDO::fetch() in a loop (for large result sets)?

I'm fetching into objects of a user-defined class, if that makes any difference.

My initial uneducated assumption was that fetchAll might be faster because PDO can perform multiple operations in one statement while mysql_query can only execute one. However I have little knowledge of PDO's inner workings and the documentation doesn't say anything about this, and whether or not fetchAll() is simply a PHP-side loop dumped into an array.

Any help?

Lotus Notes
  • 6,302
  • 7
  • 32
  • 47

7 Answers7

88

Little benchmark with 200k random records. As expected, the fetchAll method is faster but require more memory.

Result :
fetchAll : 0.35965991020203s, 100249408b
fetch : 0.39197015762329s, 440b

The benchmark code used :

<?php
// First benchmark : speed
$dbh = new PDO('mysql:dbname=testage;dbhost=localhost', 'root', '');
$dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
$sql = 'SELECT * FROM test_table WHERE 1';
$stmt = $dbh->query($sql);
$data = array();
$start_all = microtime(true);
$data = $stmt->fetchAll();
$end_all = microtime(true);

$stmt = $dbh->query($sql);
$data = array();
$start_one = microtime(true);
while($data = $stmt->fetch()){}
$end_one = microtime(true);

// Second benchmark : memory usage
$stmt = $dbh->query($sql);
$data = array();
$memory_start_all = memory_get_usage();
$data = $stmt->fetchAll();
$memory_end_all = memory_get_usage();

$stmt = $dbh->query($sql);
$data = array();
$memory_end_one = 0;
$memory_start_one = memory_get_usage();
while($data = $stmt->fetch()){
  $memory_end_one = max($memory_end_one, memory_get_usage());
}

echo 'Result : <br/>
fetchAll : ' . ($end_all - $start_all) . 's, ' . ($memory_end_all - $memory_start_all) . 'b<br/>
fetch : ' . ($end_one - $start_one) . 's, ' . ($memory_end_one - $memory_start_one) . 'b<br/>';
Arkh
  • 8,416
  • 40
  • 45
  • 62
    Yeah you don't. It's the goal of the benchmark : first one is you do a fetchAll THEN do the work on the data. Second one, you would fetch one row, do the work on this row, then fetching the next row. A good example would be when displaying a data table, do you need to store ALL your data before writing in the buffer or not ? – Arkh Jan 05 '11 at 08:24
  • 1
    Sorry for necroing, I don't get why people would say this is a bad benchmark. There is no reason to store the entire data set unless you're return that data to a user... which is just plain bad in the first place, use paging in that case. If you need to modify data on a database en masse, you should do this within the database either with a script, or a stored procedure, e.g. temp tables. – Populus Apr 17 '14 at 14:59
  • 3
    -1. This is definitely a *bad* benchmark. The memory is not the real memory. `memory_get_usage(/* true */)` shows you the memory allocated by PHP itself. It **does not** show you the ***LIBRARY*** allocated memory. Libmysqlclient and mysqlnd are using their *own* memory. Not PHP's memory. – bwoebi Jun 15 '15 at 22:20
  • doesn't that depends on wether you use [buffered queries](http://php.net/manual/en/mysqlinfo.concepts.buffering.php) or not @bwoebi? Since buffered is the default, I thought that meant that query results are sent to php's "process", hence using it's memory? as for memory, maybe the benchmark should use separate scripts and memory_get_peak_usage(true) – Félix Adriyel Gagnon-Grenier Jul 22 '15 at 17:42
  • 2
    @FélixGagnon-Grenier `memory_get*usage()` only will show memory controlled by PHP. Direct malloc() or mmap() calls won't be respected by that. And sure, unbuffered queries are basically not reading from the socket. Which means that the results then are buffered on the mysql server side. But buffered queries are stored client side … in the memory of libmysqlclient, which is allocated via malloc(). (mysqlnd will use emalloc() which allocates memory via Zend memory allocator) … But this benchmark obviously was done with libmysqlclient. (As numbers are too unrealistic for mysqlnd.) – bwoebi Jul 23 '15 at 14:54
  • Crazy how bad comments like the +27 one are getting attention on a very well designed benchmark. I'd have used the "true" and internal memory usage seperatly, but except for that it's exactly what I was looking for. pdo treats fetch() correctly and doesn't buffer the whole stuff internally, that's what I needed to know. – John Oct 11 '17 at 16:39
13

One thing about PHP that I've found to be true almost always is that a function you implement yourself will almost always be slower than the PHP equivalent. This is because when something is implemented in PHP it doesn't have all the compile time optimizations that C has (which PHP is written in) and there is high overhead of PHP function calls.

Kendall Hopkins
  • 43,213
  • 17
  • 66
  • 89
  • There are times where its worth it not to use the PHP-builtin. Such as searching a sorted array (binary search ftw). – Reece45 May 05 '10 at 04:43
  • 3
    I'm not sure I quite understand your answer, but I do have to do a couple operations on all the objects again after they are fetched which would undoubtedly require another foreach loop. Should I just stick with fetching one object at a time and performing the operations on each object as it is fetched? – Lotus Notes May 05 '10 at 04:45
  • @AlReece45 You described two completely different functions. I was talking about reimplementing the sort function in PHP vs using PHP's `sort`. @Byron I'm betting you'll find that fetching all the results using fetchAll() will still be faster, but you could benchmark it with `microtime(TRUE)` if you have doubts. – Kendall Hopkins May 05 '10 at 05:36
  • @Reece45 Binary search is logarithmic; it should be fast whether it's written in C or PHP. Sorting OTOH is `O(n log n)` -- there's a bit more savings to be had. – mpen Jun 05 '13 at 16:37
  • On a theory level it might make sense to implement binary search in PHP, but unless your N is very large (which most of the time isn't true unless your doing something wrong), it's better to just use the O(N) which PHP provides. – Kendall Hopkins Jun 05 '13 at 17:25
13

all benchmarks above which measure "memory footprint" are actually incorrect for the very simple reason.

PDO by default does load all the things into the memory and it does not care if you use fetch or fetchAll. To really get benefits of unbuffered query you should instruct PDO to use unbuffered queries:

$db->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false);

In that case you will see huge difference in memory footprint of the script

iVariable
  • 130
  • 2
  • 5
  • 1
    What is the difference between using `$stmt->fetch()` while using buffered queries (the default) and using `$stmt->fetch()` with unbuffered queries (`PDO::MYSQL_ATTR_USE_BUFFERED_QUERY` attribute set to `false`)? I saw that even if you use the default buffered mode, `$stmt->fetch()` works for very big data sets while `$stmt->fetchAll()` may return a memory limit error. So is `$stmt->fetch()` kinda `unbuffered`? – tonix Jan 24 '18 at 11:56
9

@Arkh

// $data in this case is an array of rows;

$data = $stmt->fetchAll();


// $data in this case is just one row after each loop;

while($data = $stmt->fetch()){}


// Try using

$i = 0;

while($data[$i++] = $stmt->fetch()){}

The memory difference should become neglijable

  • 2
    @stancu the top and bottom variants are effectively identical, and the additional MEM seen using fetch() is likely an artifact of overhead of the while(). The point of fetch() is to process a row at a time, using while() to accomplish the same thing as fetchAll(PDO::FETCH_NUM) is silly, as you loose out on C-level compiler optimizations taking place in the PDO module. – DavidScherer May 14 '14 at 19:07
4

But surely if you're storing the fetched data in an array, the memory usage will be equal?

<?php
define('DB_HOST', 'localhost');
define('DB_USER', 'root');
define('DB_PASS', '');
// database to use
define('DB', 'test');
try
{
   $dbh = new \PDO('mysql:dbname='. DB .';host='. DB_HOST, DB_USER, DB_PASS);   $dbh->setAttribute(PDO::ATTR_ERRMODE, PDO::ERRMODE_EXCEPTION);
   $sql = 'SELECT * FROM users WHERE 1';
   $stmt = $dbh->query($sql);
   $data = array();
   $start_all = microtime(true);
   $data = $stmt->fetchAll();
   $end_all = microtime(true);

   $stmt = $dbh->query($sql);
   $data = array();
   $start_one = microtime(true);
   while($data = $stmt->fetch()){}
   $end_one = microtime(true);

   // Second benchmark : memory usage
   $stmt = $dbh->query($sql);
   $data = array();
   $memory_start_all = memory_get_usage();
   $data = $stmt->fetchAll();
   $memory_end_all = memory_get_usage();

   $stmt = $dbh->query($sql);
   $data = array();
   $memory_end_one = 0;
   $memory_start_one = memory_get_usage();
   while($data[] = $stmt->fetch()){
     $memory_end_one = max($memory_end_one, memory_get_usage());
   }

   echo 'Result : <br/>
   fetchAll : ' . ($end_all - $start_all) . 's, ' . ($memory_end_all - $memory_start_all) . 'b<br/>
   fetch : ' . ($end_one - $start_one) . 's, ' . ($memory_end_one - $memory_start_one) . 'b<br/>';
}
catch ( PDOException $e )
{
   echo $e->getMessage();
}
?>

Result : 
fetchAll : 2.6941299438477E-5s, 9824b
fetch : 1.5974044799805E-5s, 9824b
Andy
  • 41
  • 1
  • 1
4

As Mihai Stancu was saying, there is almost no memory difference though fetchAll beats fetch + while.

Result : 
fetchAll : 0.160676956177s, 118539304b
fetch : 0.121752023697s, 118544392b

I got the results above with running while correctly:

$i = 0;
while($data[$i++] = $stmt->fetch()){
    //
}

So the fetchAll consumes less memory, but fetch + while is faster! :)

Rihards
  • 10,241
  • 14
  • 58
  • 78
  • 6
    Faster? 0.16 (`fetchAll`) vs. 0.12 (`fetch`) – Joost Dec 24 '10 at 23:18
  • 1
    With significantly larger result sets, you would see a significant difference between PDOStatement::fetch() and PDOStatement::fetchALL(). Determining what qualifies as "Significantly Larger" would be dependent on the size of each row. Additionally, by default, PDOStatement::Fetch()/fetchAll() uses the fetch mode PDO::FETCH_BOTH which effectively doubles the size of each row, changing this can help mitigate MEM usage on large result sets. – DavidScherer May 14 '14 at 18:59
  • Also, the down vote was for not providing any reference statistics for your benchmark, making it inherently flawed. The various pieces of overhead with PHP functions and what not would account for the MEM difference. – DavidScherer May 14 '14 at 19:05
2

I know this is an old topic, but I run across this having the same question. Having run my own simple "benchmark" and reading what others wrote here I came to the conclusion that this is not an exact science and while one should strive to write the quality, light code, there is no point wasting too much time at the start of the project.

My suggestion is: Gather data by running the code(in beta?) for a while and then start optimizing.

In my simple benchmark (only tested execution time) I've got results varying between 5% and 50% BOTH ways. I run both options in the same script, but when I run fetch + while first it has been faster than fetchall and vice versa. (I know I should have run them single and couple hundred times get the median and mean and then compare, but - as I have said at the begining - I concluded that in my case it is too early to start doing so.)

Strayobject
  • 642
  • 11
  • 20