2

I have a table containing roughly 300k rows describing devices for Apple's Push Notification service. I use Doctrine 2 as an ORM.

Inserting devices is no problem, however, retrieving them is a whole different story. Using a simple MySQL SELECT I can get them in a few seconds, where the WiFi is the main bottleneck. However, if I try to fetch them via Doctrine, it runs out of memory even if I allow PHP up to 1 gigabyte. I have created getters and setters and protected properties for the Doctrine entities, as per the documentation.

I have no clue what I'm doing wrong. This is fine:

$devices = mysql_query("SELECT * FROM `Devices` WHERE `deviceProperty`='someValue'");

$message = new Message();
while($device = mysql_fetch_array($devices))
{
    $message->addRecipient($device['pushToken']);
}

but this runs out of memory on the first line (it never reaches a breakpoint on the next line):

$devices = self::$entityManager->getRepository('Device')->findBy(array("deviceProperty" => "someValue"));
$message = new Message();
foreach($devices as $device)
{
    $message->addRecipient($device->getPushToken);
}
Zsub
  • 1,799
  • 2
  • 15
  • 28
  • Well thats beacause youre creating 300k entity instances in memory. – prodigitalson May 25 '12 at 15:10
  • And each entity instance consumes over 4 (four!) MB's of memory? Then what is the point of lazy loading? It seemed to me that each instance only got populated once you request a property via a getter, much like a mysql result set. – Zsub May 25 '12 at 15:13
  • Im pretty sure it lazy loads the Entity, not the data that will be mapped to the entity. So somewhere you have 300k element array with each element having the at least one property in it (the PK). and each time you do `$device->getPushToken` that gets hydrated to an object. Unless you specifically destroy that object its still in memory so at that point lazy loading doesnt apply even if im incorrect about the particulars :-) I havent used 2.x much yet but i know when doing something like this with 1.2 you woudl never use Object hydration especially if all you needed were 2 properties. – prodigitalson May 25 '12 at 15:23
  • But even so, that might consume something in the order of 300 megs of ram. I don't expect that to run out of a gigabyte of memory. I really would like to know how & what, so sorry if I sound grumpy! :) – Zsub May 25 '12 at 15:27

2 Answers2

3

You're pulling in 300k objects, it'll consume way too much memory, try processing in chunks...

$message = new Message();

$limit = 50;
$offset = 0;
while($devices = self::$entityManager->getRepository('Device')->findBy(array("deviceProperty" => "someValue"), array(), $limit, $offset))
{
   foreach($devices as $device)
   {
       $message->addRecipient($device->getPushToken);
   }
   $offset += $limit;
}
Lee Davis
  • 4,685
  • 3
  • 28
  • 39
  • Alternatively you could write a custom DQL function that used the Array hydrator and have a similar implementation to your mysql_query example. – Lee Davis May 25 '12 at 16:20
0

If you're using DQL you can use the iterate() function to iterate through the results and flush each result after processing:

$message = new Message();
$cleaner = 0;

$q = self::$entityManager->createQuery(
    'SELECT d from Device d
     WHERE
     d.deviceProperty = :devicePropertyValue
    ');
$q->setParameter('devicePropertyValue', 'someValue');
//$q->setFirstResult(10000);
//$q->setMaxResults(5000);

$devices = $q->iterate();

foreach ($devices as $row) {
    $device = $row[0];

    $message->addRecipient($device->getPushToken);

    self::$entityManager->detach($device);

    if ($cleaner++ > 100) {
        self::$entityManager->clear();
        $cleaner = 0;
    }

}

For the batch reindexing job I'm running, this reduced my memory requirement down from 3GB to under 256M.

According to the Doctrine documentation, "Iterating results is not possible with queries that fetch-join a collection-valued association" - I presume this means built-in methods such as findBy().

Note the commented-out setFirstResult() and setMaxResults() which you can use for offset and limit.

Sam Critchley
  • 3,388
  • 1
  • 25
  • 28