1

Let's say I have a mysql table with over 1 million rows. I would like to select all the rows in the table and process each entry individually. What's the best way to prevent running out of memory?

Maybe using limit to select a few rows at a time?

while ( true) {
   $q= query("SELECT * FROM table LIMIT $start, $limit");

   // processs rows

   $start += $limit;

   sleep(1);
}
  • 3
    Best bet is find some way to do whatever you're trying to do in native SQL, without fetching results at all. But yeah, if you have to fetch them, do it in chunks. PHP will probably die on you before getting through 1 million rows. – elixenide Jul 09 '14 at 17:29
  • You could also use mysql_unbuffered_query (and it's derivatives in PDO and mysqli), then use the `set_time_limit` function to reset the timeout on each entry. – EdgeCaseBerg Jul 09 '14 at 17:31

3 Answers3

3

You need to use cursors.

There is very good documentation on the web site for mysql - for example this might be a good starting point

Vyktor
  • 20,559
  • 6
  • 64
  • 96
Ed Heal
  • 59,252
  • 17
  • 87
  • 127
1

Use limit and offset which will return just part of your result, limited to limit and offseted by offset. While using these, remember of sorting your rows, so you won't get same row twice. My advice is to sort by primary key or date creation of row ascending - in your case (while(true)) loop you are pretty sure that even records added after start of the program will be processed.

Example below uses PDO to query database.

$package = 1000;
$i = 0;

try{
    while(true){

        $currOffset == i++ * $package;
        $q = $db->query(
                        "select 
                             * 
                         from 
                             foo 
                         order by 
                             bar ASC
                         offset :offset 
                         limit :limit"
                        );
        $q->bindParam(':offset', $currOffset, PDO::PARAM_INT);
        $q->bindParam(':limit', $package, PDO::PARAM_INT);

        $q->execute();

        $res = $q->fetchAll(PDO::FETCH_ASSOC);

        if(!$res || !count($res)){
            break;
        }

        //process your chunk

    }
} catch(PDOException $ex){
   //handle exception, abort transaction etc
}
ex3v
  • 3,518
  • 4
  • 33
  • 55
  • That fetchAll is not going to work with a million rows since it will try to buffer it all at once. You need to use the unbuffered version – EdgeCaseBerg Jul 09 '14 at 17:48
  • Even it there is `limit` of 1000 results set? – ex3v Jul 09 '14 at 17:50
  • Ah, I missed that part. I suppose with a limit it would be fine. Though it would be better (IMO) to use an unbuffered query. Because the limit + offset will have mysql running out to the offset each time instead of just using a cursor approach. Edit: Just in case it wasn't clear, mysql will have to page through all the queries up to the offset before it actually starts reading, with a million rows, that might take a lot of time. – EdgeCaseBerg Jul 09 '14 at 17:51
  • I've never tried unbuffered query, but I tried (and usually succeded) in writing few dozens of bash/cron scripts, operating on large amount of data. Trust me, it works ;) – ex3v Jul 09 '14 at 17:53
  • I was just thinking about situations like this: http://stackoverflow.com/questions/4481388/why-does-mysql-higher-limit-offset-slow-the-query-down the OP said they were trying to not run out of memory, so they'll need to `unset` variables If they process all the rows at once I think... though php's GC might take care of that for them nicely. (not sure) – EdgeCaseBerg Jul 09 '14 at 17:55
  • GC will do fine, AFAIK. Note that in my example `$res` is pointed to new array of results after each iteration. Older array is left with zero references, so it is deleted. – ex3v Jul 09 '14 at 17:59
  • 1
    One last thing to nitpick at your answer (so it can be the best it can be! :D ) the script will still time out using your method if it's reading a lot of data. Wouldn't it be good to use `set_time_limit` on each iteration so that the script does not time out during execution – EdgeCaseBerg Jul 09 '14 at 18:03
  • Good point. I've already set time limit in my CLI version of `php.ini` to something like bazillion years ;) So I forgot about it. I don't know though if this will work inside loop, I mean, will execution time reset after each iteration. IMO best way is to set it in header to, for example, one day and forget about it. – ex3v Jul 09 '14 at 18:22
0
$total = query("SELECT count(*) as total FROM table");

for($i=0;$i<=$total['total'];$i+=100) {
   $q= query("SELECT * FROM table ORDER BY id ASC LIMIT $i, 100");
   // process rows
   // no purpose in sleeping
}
skrilled
  • 5,350
  • 2
  • 26
  • 48
  • What will happen in this scenario: processing of 100 rows takes 10 seconds. There is million rows in database. What about rows added during program execution? Well, they will not be processed. Or even more - they will, instead of old ones, because there is no sorting in your query ;) – ex3v Jul 09 '14 at 17:38
  • updated, and generally speaking it's bad practice as far as i'm concerned to even bother attempting rows added during program execution. What will happen in this scenario: you have millions and millions of new rows added every hour? Just saying :p I'm trying to provide an easy answer for someone who isn't expert-mode yet. – skrilled Jul 09 '14 at 17:44
  • 1
    Mostly agreed. Sometimes 'think about everything' strategy isn't good... sometimes I forget about it ;) – ex3v Jul 09 '14 at 17:48
  • 1
    I like your answer though :p – skrilled Jul 09 '14 at 17:54