2

I have two tables, users and points. Currently users has 84,263 rows, while points has 1,636,119 rows. Each user can have 0 or multiple points and I need to extract which point was created last.

show create table users
CREATE TABLE `users` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `email` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `password` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `remember_token` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,
  `role` varchar(15) COLLATE utf8_unicode_ci DEFAULT 'consument',
  `created_at` timestamp NOT NULL DEFAULT current_timestamp(),
  `updated_at` timestamp NOT NULL DEFAULT current_timestamp(),
  `deleted_at` timestamp NULL DEFAULT NULL,
  `email_verified_at` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,
  `email_verify_token` text COLLATE utf8_unicode_ci DEFAULT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `users_email_unique` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=84345 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

show create table points
CREATE TABLE `points` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `user_id` int(10) unsigned NOT NULL,
  `tablet_id` int(10) unsigned DEFAULT NULL,
  `parent_company` int(10) unsigned NOT NULL,
  `company_id` int(10) unsigned NOT NULL,
  `points` int(10) unsigned NOT NULL,
  `mutation_type` tinyint(3) unsigned NOT NULL,
  `created_at` timestamp NOT NULL DEFAULT current_timestamp(),
  `updated_at` timestamp NOT NULL DEFAULT current_timestamp(),
  PRIMARY KEY (`id`),
  KEY `points_user_id_foreign` (`user_id`),
  KEY `points_company_id_foreign` (`company_id`),
  KEY `points_parent_company_index` (`parent_company`),
  KEY `points_tablet_id_index` (`tablet_id`),
  KEY `points_mutation_type_company_id_created_at_index` (`mutation_type`,`company_id`,`created_at`),
  KEY `created_at_user_id` (`created_at`,`user_id`),
  CONSTRAINT `points_company_id_foreign` FOREIGN KEY (`company_id`) REFERENCES `companies` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT `points_parent_company_foreign` FOREIGN KEY (`parent_company`) REFERENCES `parent_company` (`id`) ON DELETE CASCADE ON UPDATE CASCADE,
  CONSTRAINT `points_tablet_id_foreign` FOREIGN KEY (`tablet_id`) REFERENCES `tablets` (`id`) ON DELETE SET NULL ON UPDATE CASCADE,
  CONSTRAINT `points_user_id_foreign` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE CASCADE ON UPDATE CASCADE
) ENGINE=InnoDB AUTO_INCREMENT=1798627 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci

Queries I tried, but are taking too long (we're talking in minutes, not seconds):

select
       `users`.`id`,
       `users`.`email`,
       `users`.`role`,
       `users`.`created_at`,
       `users`.`updated_at`,
       max(pt.created_at) as `last_transaction`
from `users`
left join points as pt on pt.user_id = users.id
where `users`.`role` = 'consument' and `users`.`deleted_at` is null
group by users.id

select
       `users`.`id`,
       `users`.`email`,
       `users`.`role`,
       `users`.`created_at`,
       `users`.`updated_at`,
       pt.created_at as `last_transaction`
from `users`
left join (select points.user_id, points.created_at from points order by points.created_at desc) as pt on pt.user_id = users.id
where `users`.`role` = 'consument' and `users`.`deleted_at` is null
group by users.id

Why am I not limiting the results and returning only 100 at a time? Because I am using Yajra DataTables for Laravel and when limiting results, it only returns limited results and it does not recognize that there are more. So instead of 84,263 rows, I only get 100 rows and that's it.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
Kristjan O.
  • 814
  • 1
  • 9
  • 33

3 Answers3

0

Basically your "users" table has a "role" column. It is not indexed. So your queries are doing full table scan on "users" table which has 84263 rows. One way to to optimize it would be to have an index on "role" column. But I can see "consument" is the default value & you are querying by that value. Now suppose 95% of users are having "consument" role. Then even adding index on "role" won't help much. You would have to add more condition to filter out the query & have an index for that condition.

Your first query is better as it would avoid unnecessary inner query of second one.

If you need to return 84263 rows, then that is a sperate issue. Somehow you would have to introduce pagination. You would have to break your queries to multiple queries. Suppose in each call you return 500 users data. You can sort it by id. And in subsequent call, you can ask for next 500 where id is greater than the last id returned in the previous query (for the very first call last id value would be 0). Then the queries can use "id" as index.

You can check the query plan using "explain" keyword & can have better understanding.

aatwork
  • 2,130
  • 4
  • 17
  • The problem with pagination is that the query is build with Eloquent ORM inside Laravel and is then being used by Yajra/DataTables. I did mention that when I try to paginate the results by `limit`, I only get the limited results back and `totalRecords` of that limit, not 84263 rows. That code is executing entire query and then copying out the requested number of rows. So already dropped this solution and going other way, but speed is still the issue. – Kristjan O. Mar 24 '21 at 12:29
0

Edit

I tried with adding an index on role on users table with 1000 users ans 50000 points, your first query took ~4seconds which is way too long.

So I tried this query which took ~0.5 second, still too long :

select
       `users`.`id`,
       `users`.`email`,
       `users`.`role`,
       `users`.`created_at`,
       `users`.`updated_at`,
       pt.created_at as `last_transaction`
from `users`
left join points pt on pt.id = (select pt2.id from points pt2 WHERE pt2.user_id = users.id ORDER BY pt2.created_at DESC limit 1)
where `users`.`role` = 'consument' and `users`.`deleted_at` is null

So I added an index on points.created_at and now query took 0.05 second, which is more acceptable

Yoleth
  • 1,269
  • 7
  • 15
  • I did add indexes to `users.role` and `points.created_at` and your proposed query took 184.037 seconds to execute. Strangely if I remove mentioned indexes, query takes 180.148 seconds, so not sure what's going on because I know that indexes should speed up the query execution. I am also aware that MySQL can handle a few millions of rows, so not sure why it's struggling here with a simple join of data and with 1.6m of rows. – Kristjan O. Mar 24 '21 at 12:36
0

It looks like you want a result set with some columns from your users table, and the most recent created_at value from the points table for each user.

So-called compound covering indexes usually help speed these sorts of queries. So, let's start with what you need from points. This subquery gets it.

               SELECT user_id, MAX(created_at) last_transaction
                 FROM points
                GROUP BY user_id

This gives you a virtual table with each user_id and the created_at value you want. The following index

CREATE INDEX points_maxcreated ON points (user_id, created_at DESCENDING);

will let MySQL satisfy the subquery with an almost miraculously fast loose index scan.

Then, let's consider the rest of your query.

select
       `users`.`id`,
       `users`.`email`,
       `users`.`role`,
       `users`.`created_at`,
       `users`.`updated_at`
from `users`
where `users`.`role` = 'consument' and `users`.`deleted_at` is null

For this you want the following index

CREATE INDEX users_del_role_etc 
    ON users 
      (deleted_at, role, id, email, created_at, updated_at);

MySQL can satisfy your query directly from this index. Think of these indexes as being stored in order. MySQL random accesses the index to the first eligible row (null deleted_at, role = 'consument') and then reads the index, not the table, row by row to get the data you want.

Putting it all together, you get

select
       `users`.`id`,
       `users`.`email`,
       `users`.`role`,
       `users`.`created_at`,
       `users`.`updated_at`,
       `subquery`.`last_transaction`
from `users`
left join (
                   SELECT user_id, MAX(created_at) last_transaction
                     FROM points
                    GROUP BY user_id
          ) subquery ON users.id = subquery.user_id
where `users`.`role` = 'consument' and `users`.`deleted_at` is null

This should be reasonably speedy for the query you gave us. Nevertheless, a query that you expect to return tens of thousands of rows you also should expect to take some time. There's no magic that makes SQL handle very large result sets fast. It's designed to retrieve small result sets fast from vast tables.

With respect, your understanding of how to paginate rows from your result set isn't quite right. It's hard to believe your user will actually examine tens of thousands of rows. Without an ORDER BY operation in your query, LIMIT is a very inexpensive operation. If you need ORDER BY ... LIMIT to paginate your results, ask another question, because that performance can also be managed.

O. Jones
  • 103,626
  • 17
  • 118
  • 172