3

I am trying to fetch data from a table of size 9 GB + and having millions of records. I'm populating DataTable with that data. I am getting the records in chunks from the table i.e 10 per page through Ajax and SQL Limit query.

pagination

In the above image you can see We have 223,740 pages so when i try to access the last page the query takes forever to load the data. However data loads quicker when i try to access first pages. But accessing higher offset pages directly takes forever to load.

 public static function getAllEvaluationsWithNameForDataTable($start){
        $queryBuilder = new Builder();

        return  $queryBuilder
            ->from(array('e' =>  static::class))
            ->leftJoin('Cx\Framework\Models\Common\User\CxUser',  'e.cx_hc_user_id = u.id', 'u')
            ->columns('e.id, e.first_name, u.initials as assigned_coach, e.gender, e.email, e.phone, e.age, e.version, e.evaluation_status, e.ip_address, e.date_created, e.date_updated')
            ->orderBy('e.id asc')
            ->limit(10, $start)
            ->getQuery()
            ->execute()
            ->toArray();
}

PHP Function/Controller:

public function getEvaluationsAction() {
        // Enable Json response
        $this->setJsonResponse();
        // This action can be called only via ajax
        $this->requireAjax();

        // Forward to access denied if current user is not allowed to view evaluation details
        if (!$this->CxAuth->currentUserIsAllowedTo('VIEW', CxEbEvaluation::getClassResourceName()))
            return $this->forwardToAccessDeniedError();

        if(isset($_GET['start'])){
            $start = $this->request->get('start');
        }else{
            $start = 10;
        }

        $recordsTotal = count(CxEbEvaluation::getAllForDataTable(array('id')));

        //Get Evaluations from DB
        $evaluation_quizzes = CxEbEvaluation::getAllEvaluationsWithNameForDataTable(intval($start));

        //for getting base URL
        $url = new Url();

        $data = array();

        foreach ($evaluation_quizzes as $key => $quiz) {
            $data[ $key ][ 'id' ] = $quiz[ 'id' ];
            $data[ $key ][ 'first_name' ] = $quiz[ 'first_name' ];
            if($quiz[ 'assigned_coach' ]){
                $data[ $key ][ 'assigned_coach' ] = $quiz['assigned_coach'];
            }else{
                $data[ $key ][ 'assigned_coach' ] = "Not assigned";
            }

            $data[ $key ][ 'gender' ] = $quiz[ 'gender' ];
            $data[ $key ][ 'email' ] = $quiz[ 'email' ];
            $data[ $key ][ 'phone' ] = $quiz[ 'phone' ];
            $data[ $key ][ 'age' ] = $quiz[ 'age' ];
            $data[ $key ][ 'version' ] = $quiz[ 'version' ];
            $data[ $key ][ 'quiz' ] =  $url->get('/admin/get-evaluation-quiz-by-id');
            $data[ $key ][ 'manage-notes-messages-and-calls' ] =  $url->get('/admin/manage-notes-messages-and-calls');
            $data[ $key ][ 'date_created' ] = date("m/d/Y H:i:s", $quiz[ 'date_created' ]);
            $data[ $key ][ 'evaluation_status' ] = $quiz[ 'evaluation_status' ];
        }
        // Return data array
        return array(
            "recordsTotal"    => $recordsTotal,
            "recordsFiltered" => $recordsTotal ,
            "data"            => $data //How To Retrieve This Data
        );
        // Return data
    }

Javascript:

cx.common.data.cxAdminDataTables.EbEvaluation = $CxRecordsTable.cxAdminDataTable({
        ajaxUrl: '<?php echo $this->CxHelper->Route('eb-admin-get-evaluations')?>' + eqQuizIdQueryString,
        serverSide: true,
        processing: true,
        recordsFiltered :true,
        columns: [
            cx.common.admin.tableEditColumn('id',{ delete: true }),
            { data: 'first_name' },
            { data: 'assigned_coach' },
            { data: 'gender' },
            { data: 'email' },
            { data: 'phone' },
            { data: 'age' },
            cx.common.admin.tableLinkColumn('quiz', quizLinkOptions),
            cx.common.admin.tableEditColumn('id', healthCoachLinkOptions),
            cx.common.admin.tableLinkColumn('manage-notes-messages-and-calls', manageNotesMessagesAndCalls),
            { data: 'date_created' },
            cx.common.admin.tableSwitchableColumn('evaluation_status', {
                editable: true,
                createdCell: function (td, cellData, rowData, row, col){
                    $(td).data('evaluation-status-id', rowData.id);
                },
                onText: 'Complete',
                offText: 'In progress'
            })
        ],
        toolbarOptions:{
            enabled: false
        },          success: function (data) {
                            cx.common.data.cxAdminDataTables.EbEvaluation.cxAdminDataTable("reloadAjax");
                        }
                    });
                }
                else {
                    $row.removeClass('alert');
                }
            });
        }
    });

I hope the question is clear. If anything else is required just update me I will provide.

(From Comment)

SELECT  e.id` AS id, e.first_name AS first_name,
        u.initials AS assigned_coach,
        e.gender AS gender, e.email AS email, e.phone AS phone,
        e.age AS age, e.version AS version,
        e.evaluation_status AS evaluation_status,
        e.ip_address AS ip_address, e.date_created AS date_created,
        e.date_updated AS date_updated
    FROM  evaluation_client AS e
    LEFT JOIN  cx_user AS u  ON e.cx_hc_user_id = u.id
    ORDER BY  e.id ASC
    LIMIT  :APL0 OFFSET, :APL1
Rick James
  • 135,179
  • 13
  • 127
  • 222
DojoDev
  • 95
  • 1
  • 11
  • https://dba.stackexchange.com/questions/66294/optimize-a-query-on-two-big-tables – Masivuye Cokile Nov 12 '18 at 15:07
  • https://dba.stackexchange.com/questions/75963/optimizing-large-database-query-25-million-rows-using-max-and-group-by – Masivuye Cokile Nov 12 '18 at 15:07
  • 1
    https://stackoverflow.com/questions/4481388/why-does-mysql-higher-limit-offset-slow-the-query-down – Masivuye Cokile Nov 12 '18 at 15:08
  • you need an index. – Hogan Nov 12 '18 at 15:12
  • @MasivuyeCokile tried the above stackoverflow link but it isn't related to my scenario. – DojoDev Nov 12 '18 at 15:14
  • @Hogan can you point out in my query? I already have id PK? – DojoDev Nov 12 '18 at 15:15
  • Run an explain on yoru query and you can have the system point it out for you. – Hogan Nov 12 '18 at 15:17
  • @MasivuyeCokile What do you think of applying the sub-query logic with `@rownum` from https://stackoverflow.com/a/8510069/2191572 and using `WHERE` on the outer query's `row_number` to simulate a `LIMIT`? – MonkeyZeus Nov 12 '18 at 15:32
  • @Hogan `Array ( [sql] => SELECT `e`.`id` AS `id`, `e`.`first_name` AS `first_name`, `u`.`initials` AS `assigned_coach`, `e`.`gender` AS `gender`, `e`.`email` AS `email`, `e`.`phone` AS `phone`, `e`.`age` AS `age`, `e`.`version` AS `version`, `e`.`evaluation_status` AS `evaluation_status`, `e`.`ip_address` AS `ip_address`, `e`.`date_created` AS `date_created`, `e`.`date_updated` AS `date_updated` FROM `evaluation_client` AS `e` LEFT JOIN `cx_user` AS `u` ON `e`.`cx_hc_user_id` = `u`.`id` ORDER BY `e`.`id` ASC LIMIT :APL0 OFFSET :APL1 ` – DojoDev Nov 12 '18 at 15:37
  • Here is a good article explaining all necessary concepts. It is based on Postgres, but the concepts apply to all databases. [Pagination Done the Right Way](https://use-the-index-luke.com/blog/2013-07/pagination-done-the-postgresql-way). As the domain name suggests, the primary theme is "use the index". – Vladimir Baranov Nov 18 '18 at 23:41

3 Answers3

4

The Why does MYSQL higher LIMIT offset slow the query down? question and answers, linked by Masivuye Cokile, as well as https://explainextended.com/2009/10/23/mysql-order-by-limit-performance-late-row-lookups/ link provided there, contain an excellent rundown on why large offset queries are slow. Basically, for LIMIT 150000, 10 MySQL still scans the entire 150000 rows even if it discards them later. To speed it up you can either:

  • use sequential pagination, i.e. "show 10 entries after ID #N", which works very fast and is a good alternative, but discards actual page number; your users will be left with "next/prev" links and/or an approximate page number you can calculate using a count query.
  • or create an index on id, then force mysql to perform an index-only search.

For the second approach, you'll have to rewrite the query from

SELECT ... 
  FROM table t 
WHERE ...
ORDER by t.id ASC
LIMIT 150000, 10

to

SELECT  ...
  FROM (
        SELECT  id
        FROM    table
        ORDER BY
                id ASC
        LIMIT 150000, 10
        ) o
JOIN table t
  ON t.id = o.id
WHERE ...
ORDER BY t.id ASC

Alternatively, since you're not confined to the single query, you can retrieve the ID of the first item on your page using

SELECT id 
  FROM table 
 ORDER BY id ASC 
 LIMIT 150000, 1

then use said id to retrieve actual data:

SELECT ...
  FROM table
 WHERE id >= $id
   AND ...
 ORDER BY id ASC
 LIMIT 0, 10
Valerian Pereira
  • 725
  • 1
  • 6
  • 16
Timekiller
  • 2,946
  • 2
  • 16
  • 16
  • 1
    [_More_](http://mysql.rjweb.org/doc.php/pagination) on 'remembering where you left off'. – Rick James Nov 12 '18 at 16:45
  • @Timekiller where to find the id of first item? Do you mean $start variable of datatable? – DojoDev Nov 18 '18 at 07:16
  • First item overall or first item on page? If you mean overall, either `select min(id)` or use something like `id >= 0`; if you mean first item on page, look at my second to last query and adjust the `limit 150000, 1` part so that offset matches `pages*items_per_page` instead of 150000. – Timekiller Nov 19 '18 at 18:26
2

The pattern SELECT whatever FROM vast_table ORDER BY something LIMIT 10 large_number is a notorious performance antipattern. Why? Because it has to examine a great many rows just to return a few.

If your id value is a primary key (or any indexed column) you can paginate by

SELECT whatever FROM vast_table WHERE id BETWEEN large_value AND large_value+9 ORDER BY id;

Or you might try

SELECT whatever FROM vast_table WHERE id >= large_value ORDER BY id LIMIT 10;

This doesn't paginate perfectly if your id values have gaps in them. But it performs tolerably well.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • Hello @O. Jones i already tried similar but as you mentioned it skips records. `return $queryBuilder ->from(array('e' => static::class)) ->leftJoin('Cx\Framework\Models\Common\User\CxUser', 'e.cx_hc_user_id = u.id', 'u') ->columns('e.*') ->where('e.id > :ID:') ->orderBy('e.id asc') ->limit(10, $start) ->getQuery() ->execute(array('ID' => $start)) ->toArray();` – DojoDev Nov 12 '18 at 15:43
  • @DojoDev - Well, sounds like QueryBuilder is not a good tool for this use. – Rick James Nov 12 '18 at 16:48
  • You shouldn't include your starting point in `limit(10, $start) `. And you probably want `where('e.id >= :ID:')` note `>=`. – O. Jones Nov 12 '18 at 19:33
1

The issue was related to dates column datatype in my table. I was using int datatype for dates fields and when I changed the datatype of my dates column to datetime, the search results were in seconds.

Source where i found the solution @ http://dbscience.blogspot.com/2008/08/can-timestamp-be-slower-than-datetime.html

DojoDev
  • 95
  • 1
  • 11