2

I'm using CakePHP with $modelName->find(...) calls to select quite a number of rows (could be hundreds)

Normally, in PHP/MySQL of course this wouldn't be a problem, as you fetch them in a while loop. But, CakePHP loads all rows in to an array which exhausts the memory limit.

Is there a way to use the $modelName->find(...) constructs but return an iterator to fetch each row on demand?

Thanks, David

daviddoran
  • 528
  • 3
  • 12

11 Answers11

4

Here is code that you can use to process a table few rows at a time

    $limit = 10;
    $loop_no = 0;
    do {
        $handles = $this->SocialChannelHandle->find('all', array(
            'fields' => array('brand_id', 'handle'),
            'conditions' => array('social_channel_id' => $facebook['SocialChannel']['id']),
            'limit'  => $limit,
            'offset' => $limit * $loop_no,
            'order'  => 'id asc',
            'recursive' => -1)
        );
        $loop_no++;
    } while (count($handles) == $limit);
Xavier John
  • 8,474
  • 3
  • 37
  • 51
4

If your problem is caused by the relationships of your model, you can reduce the recursion this way:

$modelname->recursive = -1;

then you only will get the data of the current model, without any relation.

Iterating through all the records, you'll be able to obtain one-by-one their relationships querying again with recursive > 0

rossoft
  • 2,182
  • 17
  • 17
2

No, out of the box Cake PHP (and ActiveRecord in general) doesn't support iterating over a result set like that. If you have a use case where you actually need all the records of a table, you're probably better off using raw SQL. (or rethinking your use case).

You could also (and have likely already thought to) use an offset of some kind, and call ->find multiple times. If you take this approach don't forget to order your result set by some field to ensure a deterministic result. Databases only seem to return sorted rows when you leave an ORDER BY off. I haven't personally tried this, and it seems inefficient from a multiple queries standpoint, but it's something worth trying.

Alana Storm
  • 164,128
  • 91
  • 395
  • 599
  • Actually, i'm implementing this stuff right now using PDO and buffered queries. I return an object from my find function that implements iterable, and has a handle to the statement. http://stackoverflow.com/questions/531536/php-orm-query-results-arrays-vs-result-handle-wrapped-in-iterator-interface – SchizoDuckie Feb 12 '09 at 02:15
  • and people say cake is slow... huph! – Xeoncross Aug 05 '10 at 22:47
  • Use the built in pagination! `$this->Model->paginate()` – David Yell Aug 02 '12 at 14:28
0

You can either limit the find method call with the recursive param (API for Model#find) or you can unbind model associations on the fly and reduce the amount of data retrieved (Creating and Destroying Associations on the Fly)

RFelix
  • 11
  • 4
0

It's been a while, but since this question comes up in the search, I thought to mention that there is actually a built-in way to do this. Something like:

$page = 1;
$limit = 100;
while ($posts = $this->Post->find('all', array(
   'conditions' => ...,
   'page' => $page,
   'limit' => $limit,
   ...
))) {
    foreach ($posts as $post) {
        ...deal with one row...
    }
    $page++;
}

will paginate through the dataset, although with some performance penalty because the query is re-executed at each while-loop.

(not tested)

savedario
  • 902
  • 15
  • 26
0

If you're using a stored procedure to produce the query, it might be a good idea to mimic paging behavior to get segments of the query at a time. In this case, you'd send in two extra parameters to the stored procedure for the starting row index and the "page"-size, and return adjust the select statement to retrieve only those records between row_index + 1 and row_index + page_size.

This can all be put into an extra layer of looping, so you get a new segment, and then inside of that go through each row in that segment.

notnot
  • 4,472
  • 12
  • 46
  • 57
  • Thanks, It's a solution I suppose. The only problem is that there's a trade-off between memory and queries. So, if I have 1000 queries and I've memory for 100 then I've to do 10 queries. But I still have to store 100 rows at a time. Becomes worse at 10,000 or 100,000... Unfortunate really. Dave – daviddoran Feb 11 '09 at 22:22
  • It depends on what you're doing with the data on the other side. If you're displaying it, then you can use AJAX to keep populating based on each new chunk that comes in, which would save time in loading the page initially. The programmer's bane, you have to trade time for space and vice-versa. – notnot Feb 11 '09 at 23:01
  • I'm running through all the rows (they're metadata for rows in other tables...which are children of other rows..) and doing some stats to generate a summary array. I just need to "see" each row once which should be fine, even if it takes a bit but loading is a problem. Offset it is it seems. – daviddoran Feb 11 '09 at 23:09
0

You might also be getting so much data because of your model relationships. For your someModel->find() call, how many other models are related to someModel?

I've used at least 1000 rows in an array on a typical shared hosting server without memory issues.

UPDATE: You can simply use the Containable behavior if you do not want the related models to be returned. So, if you are trying to find all Posts but do not want the Comments, you can use the $this->Post->contain() syntax to only return Post records. The Containable is a behavior that must by added in your model using $actAs parameter, $actAs = array('Containable');

0

I understand that you're looking for an iterator returned from the find condition, but how about using the LIMIT clause with a variable offset (e.g. the # of rows you want returned at once)? This could bring up some concurrency issues, but if you also include the ORDER BY id clause, you should see consistent behavior in the rows returned. Then, in your loop, just repeatedly issue the find(...) query.

This clearly isn't an elegant of a solution as an iterator would be, but I'd imagine that the overhead of repeatedly issuing a request to return more rows would about balance out with the savings by retrieving several rows at once (in Cake).

Finally, if you really are looking for performance, then I think CakePHP might not be your bag of tea. It is improving in speed with new releases, but I believe it still lags behind other frameworks pretty significantly when it comes to performance.

Travis Leleu
  • 4,190
  • 2
  • 27
  • 33
  • I suppose using offset and limit is the best way. I was hoping not to have to resort to it, presuming that there would be a way to access CalePHP's fetching method manually. It's just unfortunate because what I'm doing should be pretty efficient but there's so much overhead. Arg... Dave – daviddoran Feb 11 '09 at 22:41
  • Overhead is the name of the game with Cake! – Travis Leleu Mar 02 '09 at 20:17
0

I guess this is not possible due CakePHP is dynamically building a multidimensional array representing your database entity relationships. This should be done after fetching all query rows in order CakePHP to know all possible related entities.

Example:

3 rows needs to be fetched in order to build the corresponding multidimensional array:

Article 1
  |
  -- Comment 1
  |
  -- Comment 2
  |
  -- Comment 3

Query result (1..n):

Article | Comment
-----------------
1       | 1
-----------------
1       | 2
-----------------
1       | 3
knoopx
  • 17,089
  • 7
  • 36
  • 41
  • Not sure I understand. Usually even with a complex hierarchy u can use a constant number of queries. Especially in CakePHP the way it loads into memory. So, if u have two articles, each with fifty comments you can first get the articles, then SELECT * comments WHERE article_id IN(article_ids) – daviddoran Feb 11 '09 at 22:45
  • This is inneficient and a bad practice. CakePHP uses *eager loading* which prevents the dreaded 1+N problem in which fetching 100 posts that each need to display their author triggers 101 database queries. Through the use of eager loading, the 101 queries can be reduced to 2. – knoopx Feb 12 '09 at 16:51
0

you can add a limit to your find request. I do not have time right now to write a complete answer. I'll update it later.

And no, from what I know, when you do a request in mysql or with the normal driver. It will return a all the elements of your select anyway. So if you have a problem with memory limite it might be somewhere else. You could add a limit to a certain number of row. If your table have multiple dependency but you don't need to load every foreign key you could use the "contains" attribute to only load what you need.

Could you give us a describe of your table. and what you want to select.

Loïc Faure-Lacroix
  • 13,220
  • 6
  • 67
  • 99
-1

Ruby On rails handles this alot better. The default behavior is not to include any other tables unless you use :include => :table_name and then it will generate joins on the fly..

There is no reason it can't do this, it just doesn't.

Stephen
  • 3,341
  • 1
  • 22
  • 21