50

I would like to know what is the difference between laravel chunk and laravel cursor method. Which method is more suitable to use? What will be the use cases for both of them? I know that you should use cursor to save memory but how it actually works in the backend?

A detailed explanation with example would be useful because I have searched on stackoverflow and other sites but I didn't found much information.

Here is the code snippet's from the laravel documentation.

Chunking Results

Flight::chunk(200, function ($flights) {
    foreach ($flights as $flight) {
        //
    }
});

Using Cursors

foreach (Flight::where('foo', 'bar')->cursor() as $flight) {
    //
}
General Grievance
  • 4,555
  • 31
  • 31
  • 45
Suraj
  • 2,181
  • 2
  • 17
  • 25
  • 1
    from the [api docs](https://laravel.com/api/5.2/Illuminate/Database/Eloquent/Builder.html#method_cursor): **chunk**: Chunk the results of the query. **cursor**: Get a generator for the given query. – online Thomas Aug 02 '17 at 15:16
  • 3
    Take a look [here](https://translate.google.com/translate?hl=en&sl=auto&tl=en&u=http%3A%2F%2Fqiita.com%2Fryo511%2Fitems%2Febcd1c1b2ad5addc5c9d) it's well explained :) – Maraboc Aug 02 '17 at 15:19

8 Answers8

65

We have a comparison: chunk() vs cursor()

  • cursor(): High Speed
  • chunk(): Constant Memory Usage

10,000 records:

+-------------+-----------+------------+
|             | Time(sec) | Memory(MB) |
+-------------+-----------+------------+
| get()       |      0.17 |         22 |
| chunk(100)  |      0.38 |         10 |
| chunk(1000) |      0.17 |         12 |
| cursor()    |      0.16 |         14 |
+-------------+-----------+------------+

100,000 records:

+--------------+------------+------------+
|              | Time(sec)  | Memory(MB) |
+--------------+------------+------------+
| get()        |        0.8 |     132    |
| chunk(100)   |       19.9 |      10    |
| chunk(1000)  |        2.3 |      12    |
| chunk(10000) |        1.1 |      34    |
| cursor()     |        0.5 |      45    |
+--------------+------------+------------+
  • TestData: users table of Laravel default migration
  • Homestead 0.5.0
  • PHP 7.0.12
  • MySQL 5.7.16
  • Laravel 5.3.22
mohammad asghari
  • 1,817
  • 1
  • 16
  • 23
  • 7
    Do you have any idea why chunks has lower memory usage than cursor? That seems a bit odd to me. – Antti Pihlaja Feb 21 '19 at 11:52
  • 2
    @AnttiPihlaja I think this is because `cursor()` still keeps the result set (100k records) in memory and fetches the rows as objects on demand (using `PDOStatement::fetch`. `chunk()` uses `LIMIT` and `OFFSET` to limit resultset size and load the whole resultset into memory for each chunk/query (10k rows) using `PDOStatement::fetchAll`. – Ion Bazan Jul 31 '19 at 07:49
  • 3
    @IonBazan Yes. But that's very unexpected behavior for db cursor. The reason is that Laravel configures underlying PDO connection to behave like that. – Antti Pihlaja Aug 02 '19 at 03:38
  • it seems that using cursor is always better than get(), but this is not true. Cursor performance is slower that get() with bigger data sets, because cursor fetches records from the buffer one at a time using fetch, while get returns everything using fetchAll. fetchAll has been proven to be faster than looping through fetch. – Bernard Wiesner Nov 18 '21 at 05:45
  • @BernardWiesner you can test your scenarios and update the answer. – mohammad asghari Nov 19 '21 at 07:48
50

Indeed This question might attract some opinionated answer, however the simple answer is here in Laravel Docs

Just for reference:

This is chunk:

If you need to process thousands of Eloquent records, use the chunk command. The chunk method will retrieve a "chunk" of Eloquent models, feeding them to a given Closure for processing. Using the chunk method will conserve memory when working with large result sets:

This is Cursor:

The cursor method allows you to iterate through your database records using a cursor, which will only execute a single query. When processing large amounts of data, the cursor method may be used to greatly reduce your memory usage:

Chunk retrieves the records from the database, and load it into memory while setting a cursor on the last record retrieved so there is no clash.

So the advantage here is if you want to reformat the large record before they are sent out, or you want to perform an operation on an nth number of records per time then this is useful. An example is if you are building a view out/excel sheet, so you can take the record in counts till they are done so that all of them are not loaded into the memory at once and thereby hitting the memory limit.

Cursor uses PHP Generators, you can check the php generators page however here is an interesting caption:

A generator allows you to write code that uses foreach to iterate over a set of data without needing to build an array in memory, which may cause you to exceed a memory limit, or require a considerable amount of processing time to generate. Instead, you can write a generator function, which is the same as a normal function, except that instead of returning once, a generator can yield as many times as it needs to in order to provide the values to be iterated over.

While I cannot guarantee that I understand fully the concept of Cursor, but for Chunk, chunk runs the query at every record size, retrieving it, and passing it into the closure for further works on the records.

Hope this is useful.

Vusys
  • 521
  • 3
  • 16
  • 1
    Thanks for the honest answer. Although still, I don't understand the concept of cursor fully. But your answer explains a lot of things. – Suraj Aug 02 '17 at 16:41
  • 7
    If it can help you understand better, Laravel's `select` uses PHP's [`fetchAll`](http://php.net/manual/en/pdostatement.fetchall.php) whereas Laravel's `cursor` uses PHP's [`fetch`](http://php.net/manual/en/pdostatement.fetch.php). Both execute the same amount of SQL, but the former immediately builds an array with the whole data, whereas the latter fetches the data one row at a time, allowing to hold in memory only this row, not the previous nor the following ones. – Gras Double Dec 05 '18 at 05:28
35

Cursor()

  • only single query
  • fetch result by call PDOStatement::fetch()
  • by default buffered query is used and fetch all result at once.
  • turned only current row into eloquent model

Pros

  • minimize eloquent model memory overhead
  • easy to manipulate

Cons

  • huge result leads to out of memory
  • buffered or unbuffered is a trade-off

Chunk()

  • chunk query in to queries with limit and offset
  • fetch result by call PDOStatement::fetchAll
  • turned results into eloquent models batchly

Pros

  • controllable used memory size

Cons

  • turned results in to eloquent models batchly may cause some memory overhead
  • queries and memory usage is a traid-off

TL;DR

I used to think cursor() will do query each time and only keep one row result in memory. So when I saw @mohammad-asghari's comparison table I got really confused. It must be some buffer behind the scenes.

By tracking Laravel Code as below

/**
 * Run a select statement against the database and returns a generator.
 *
 * @param  string  $query
 * @param  array  $bindings
 * @param  bool  $useReadPdo
 * @return \Generator
 */
public function cursor($query, $bindings = [], $useReadPdo = true)
{
    $statement = $this->run($query, $bindings, function ($query, $bindings) use ($useReadPdo) {
        if ($this->pretending()) {
            return [];
        }

        // First we will create a statement for the query. Then, we will set the fetch
        // mode and prepare the bindings for the query. Once that's done we will be
        // ready to execute the query against the database and return the cursor.
        $statement = $this->prepared($this->getPdoForSelect($useReadPdo)
                          ->prepare($query));

        $this->bindValues(
            $statement, $this->prepareBindings($bindings)
        );

        // Next, we'll execute the query against the database and return the statement
        // so we can return the cursor. The cursor will use a PHP generator to give
        // back one row at a time without using a bunch of memory to render them.
        $statement->execute();

        return $statement;
    });

    while ($record = $statement->fetch()) {
        yield $record;
    }
}

I understood Laravel build this feature by wrap PDOStatement::fetch(). And by search buffer PDO fetch and MySQL, I found this document.

https://www.php.net/manual/en/mysqlinfo.concepts.buffering.php

Queries are using the buffered mode by default. This means that query results are immediately transferred from the MySQL Server to PHP and then are kept in the memory of the PHP process.

so by doing PDOStatement::execute() we actually fetch whole result rows at ones and stored in the memory, not only one row. So if the result is too huge, this will lead to out of memory exception.

Though the Document shown we could use $pdo->setAttribute(PDO::MYSQL_ATTR_USE_BUFFERED_QUERY, false); to get rid of buffered query. But the drawback should be caution.

Unbuffered MySQL queries execute the query and then return a resource while the data is still waiting on the MySQL server for being fetched. This uses less memory on the PHP-side, but can increase the load on the server. Unless the full result set was fetched from the server no further queries can be sent over the same connection. Unbuffered queries can also be referred to as "use result".

劉恒溫
  • 451
  • 4
  • 4
  • 1
    very nice explanation. I was confused about how cursor would lead to out of memory issue on a large dataset. Your answer really helped me out. – Anuj Shrestha Apr 18 '20 at 07:17
16

chunk is based on pagination, it maintains a page number, and do the looping for you.

For example, DB::table('users')->select('*')->chunk(100, function($e) {}) will do multiple queries until the result set is smaller than the chunk size(100):

select * from `users` limit 100 offset 0;
select * from `users` limit 100 offset 100;
select * from `users` limit 100 offset 200;
select * from `users` limit 100 offset 300;
select * from `users` limit 100 offset 400;
...

cursor is based on PDOStatement::fetch and Generator.

$cursor = DB::table('users')->select('*')->cursor()
foreach ($cursor as $e) { }

will do issue a single query:

select * from `users`

But the driver doesn't fetch the result set at once.

oraoto
  • 161
  • 1
  • 2
2

The cursor method uses Lazy Collections, but just runs the query once.

https://laravel.com/docs/6.x/collections#lazy-collections

However, the query builder's cursor method returns a LazyCollection instance. This allows you to still only run a single query against the database but also only keep one Eloquent model loaded in memory at a time.

Chunk runs the query multiple times, and loads each result of the chunk into Eloquent models at one time.

PhillipMcCubbin
  • 495
  • 1
  • 4
  • 10
1

Assuming that you have a million record in db. Probably this would give the best result. You can use something like that. With that you will use chunked LazyCollections.

User::cursor()->chunk(10000);
oguz463
  • 63
  • 6
1

Best is to take a look at the source code.

select() or get()

https://github.com/laravel/framework/blob/8.x/src/Illuminate/Database/Connection.php#L366

return $statement->fetchAll();

It uses fetchAll which loads all records into memory. This is fast but consumes a lot of memory.

cursor()

https://github.com/laravel/framework/blob/8.x/src/Illuminate/Database/Connection.php#L403

while ($record = $statement->fetch()) {
   yield $record;
}

It uses fetch, it loads only 1 record into memory from the buffer at a time. Note that it only executes one query though. Lower memory but slower, since it iterates one by one. (note that depending on your php configuration, the buffer can be either stored on php side or mysql. Read more here)

chunk()

https://github.com/laravel/framework/blob/8.x/src/Illuminate/Database/Concerns/BuildsQueries.php#L30

public function chunk($count, callable $callback)
{
    $this->enforceOrderBy();
    $page = 1;
    do {
        $results = $this->forPage($page, $count)->get();
        $countResults = $results->count();

        if ($countResults == 0) {
            break;
        }

        if ($callback($results, $page) === false) {
            return false;
        }

        unset($results);

        $page++;
    } while ($countResults == $count);

    return true;
}

Uses many smaller calls of fetchAll (by using get()), and tries to keep memory low by breaking a big query result into smaller queries using limit depending on the chunk size you specify. In a way its trying to use the benefit of both get() and cursor().

As a rule of thumb I would say go with chunk, or even better chunkById if you can. (chunk has bad performance on big tables since it uses offset, chunkBy id uses limit).

lazy()

In laravel 8 there is also lazy(), its similar to chunk but syntax is cleaner (uses generators)

https://laravel.com/docs/8.x/eloquent#streaming-results-lazily

foreach (Flight::lazy() as $flight) {
    //
}

In does the same as chunk(), just you dont need a callback, since it uses php Generator. You can also use lazyById() similar to chunk.

Bernard Wiesner
  • 961
  • 6
  • 14
0

I made some benchmark using cursor and where

foreach (\App\Models\Category::where('type','child')->get() as $res){

}

foreach (\App\Models\Category::where('type', 'child')->cursor() as $flight) {
    //
}

return view('welcome');

here is the result: chunk is faster thank using where

hendra1
  • 1,359
  • 1
  • 15
  • 24