2

We have just moved from mariadb 5.5 to MySQL 8 and some of the update queries have suddenly become slow. On more investigation, we found that MySQL 8 does not use index when the subquery has group column.

For example, below is a sample database. Table users maintain the current balance of the users per type and table 'accounts' maintain the total balance history per day.

CREATE DATABASE 'test';

CREATE TABLE `users` (
  `uid` int(10) unsigned NOT NULL DEFAULT '0',
  `balance` int(10) unsigned NOT NULL DEFAULT '0',
  `type` int(10) unsigned NOT NULL DEFAULT '0',
  KEY (`uid`) 
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE `accounts` (
  `uid` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `balance` int(10) unsigned NOT NULL DEFAULT '0',
  `day` int(10) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`uid`),
  KEY `day` (`day`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Below is a explanation for the query to update accounts

mysql> explain update accounts a inner join (
      select uid, sum(balance) balance, day(current_date()) day from users) r 
           on r.uid=a.uid and r.day=a.day set a.balance=r.balance;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                          |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
|  1 | UPDATE      | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | no matching row in const table |
|  2 | DERIVED     | users | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | NULL                           |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
2 rows in set, 1 warning (0.00 sec)

As you can see, mysql is not using index.

On more investigation, I found that if I remove sum() from the subquery, it starts using index. However, that's not the case with mariadb 5.5 which was correctly using the index in all the case.

Below are two select queries with and without sum(). I've used select query to cross check with mariadb 5.5 since 5.5 does not have explanation for update queries.

mysql> explain select * from accounts a inner join (
        select uid, balance, day(current_date()) day from users
         ) r on r.uid=a.uid and r.day=a.day ;
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+-------+
| id | select_type | table | partitions | type   | possible_keys | key     | key_len | ref        | rows | filtered | Extra |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+-------+
|  1 | SIMPLE      | a     | NULL       | ref    | PRIMARY,day   | day     | 4       | const      |    1 |   100.00 | NULL  |
|  1 | SIMPLE      | users | NULL       | eq_ref | PRIMARY       | PRIMARY | 4       | test.a.uid |    1 |   100.00 | NULL  |
+----+-------------+-------+------------+--------+---------------+---------+---------+------------+------+----------+-------+
2 rows in set, 1 warning (0.00 sec)

and with sum()

mysql> explain select * from accounts a inner join (
         select uid, sum(balance) balance, day(current_date()) day from users
            ) r on r.uid=a.uid and r.day=a.day ;
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
| id | select_type | table | partitions | type | possible_keys | key  | key_len | ref  | rows | filtered | Extra                          |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
|  1 | PRIMARY     | NULL  | NULL       | NULL | NULL          | NULL | NULL    | NULL | NULL |     NULL | no matching row in const table |
|  2 | DERIVED     | users | NULL       | ALL  | NULL          | NULL | NULL    | NULL |    1 |   100.00 | NULL                           |
+----+-------------+-------+------------+------+---------------+------+---------+------+------+----------+--------------------------------+
2 rows in set, 1 warning (0.00 sec)

Below is output from mariadb 5.5

MariaDB [test]> explain select * from accounts a inner join (
       select uid, sum(balance) balance, day(current_date()) day from users
             ) r on r.uid=a.uid and r.day=a.day ;
+------+-------------+------------+------+---------------+------+---------+-----------------------+------+-------------+
| id   | select_type | table      | type | possible_keys | key  | key_len | ref                   | rows | Extra       |
+------+-------------+------------+------+---------------+------+---------+-----------------------+------+-------------+
|    1 | PRIMARY     | a          | ALL  | PRIMARY,day   | NULL | NULL    | NULL                  |    1 |             |
|    1 | PRIMARY     | <derived2> | ref  | key0          | key0 | 10      | test.a.uid,test.a.day |    2 | Using where |
|    2 | DERIVED     | users      | ALL  | NULL          | NULL | NULL    | NULL                  |    1 |             |
+------+-------------+------------+------+---------------+------+---------+-----------------------+------+-------------+
3 rows in set (0.00 sec)

Any idea what are we doing wrong?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Rahul
  • 390
  • 4
  • 14
  • Where is your query? – Ankit Bajpai Nov 26 '19 at 07:04
  • 1
    @AnkitBajpai It's the first query in the question, after the `CREATE TABLE`s. – Cully Nov 26 '19 at 07:17
  • Your subquery does not make much sense. Try it separately, it will return a single row, with a random uid from your table, it does NOT group by users (and will [throw an error](https://stackoverflow.com/q/34115174) for MySQL 8 with default settings). Did you get the expected results from that query in MariaDB? The update should actually only update a single row, and with the wrong value (unless there is just one distinct uid in `users`). Btw, MariaDB actually also did not use *that* index, but created an automatic key (for the one line) after doing the subquery on users with ALL (line 3). – Solarflare Nov 26 '19 at 10:11
  • @Solarflare we know what we are doing. If you can help with using index, will be appreciated. – Rahul Nov 26 '19 at 10:29
  • There is no index that's actually being used! MariaDB creates a new index (`key0`) for your subselect (after reading all rows in `users`) for *one* row. Such an index is superfluous (if at all, it takes time to create it), and MariaDB probably just does it because the developers didn't check for this edge case. Please make sure this is actually the correct query (no `group by`) and/or that your users table isn't "unusual" (e.g. just has one userid) and/or explain why you want that index (it doesn't seem to help), so you and us don't try to solve different problems. – Solarflare Nov 26 '19 at 11:49
  • If uid is the PK, this means there can be only one `day` in `accounts` for each user. (`PRIMARY KEY` is defined to be unique.) So, the question and the schema don't make sense. Please fix. – Rick James Dec 01 '19 at 19:31

1 Answers1

1

As others have commented, break your update query apart...

update accounts join

then your query

on condition of the join.

Your inner select query of

select uid, sum(balance) balance, day(current_date()) day from users

is the only thing that is running, getting some ID and the sum of all balances and whatever the current day. You never know which user is getting updated, let alone the correct amount. Start by getting your query to see your expected results per user ID. Although the context does not make sense that your users table has a "uid", but no primary key thus IMPLYING there is multiple records for the same "uid". The accounts (to me) implies ex: I am a bank representative and sign up multiple user accounts. Thus my active portfolio of client balances on a given day is the sum from users table.

Having said that, lets look at getting that answer

select
      u.uid,
      sum( u.balance ) allUserBalance
   from
      users u
   group by
      u.uid

This will show you per user what their total balance is as of right now. The group by now gives you the "ID" key to tie back to the accounts table. In MySQL, the syntax of a correlated update for this scenario would be... (I am using above query and giving alias "PQ" for PreQuery for the join)

update accounts a
   JOIN
   ( select
          u.uid,
          sum( u.balance ) allUserBalance
       from
          users u
       group by
          u.uid ) PQ
      -- NOW, the JOIN ON clause ties the Accounts ID to the SUM TOTALS per UID balance
      on a.uid = PQ.uid
   -- NOW you can SET the values
   set Balance = PQ.allUserBalance,
       Day = day( current_date())

Now, the above will not give a proper answer if you have accounts that no longer have user entries associated... such as all users get out. So, whatever accounts have no users, their balance and day record will be as of some prior day. To fix this, you could to a LEFT-JOIN such as

update accounts a
   LEFT JOIN
   ( select
          u.uid,
          sum( u.balance ) allUserBalance
       from
          users u
       group by
          u.uid ) PQ
      -- NOW, the JOIN ON clause ties the Accounts ID to the SUM TOTALS per UID balance
      on a.uid = PQ.uid
   -- NOW you can SET the values
   set Balance = coalesce( PQ.allUserBalance, 0 ),
       Day = day( current_date())

With the left-join and COALESCE(), if there is no record summation in the user table, it will set the account balance to zero.

DRapp
  • 47,638
  • 12
  • 72
  • 142