3

Thank you all in advance. I have encountered an issue in upgrading my database from MySQL 5.5 to 5.7 that has me completely and totally confounded. The upgrade was not done using mysqldump or similar, but as a rebuild from several tab separated input files using several very long SQL scripts. One seemingly innocuous query in particular (inside a stored procedure) has been giving me trouble and I cannot work out why:

UPDATE liverpool.master_person mp 
SET Link_Count = ( SELECT count(*) FROM liverpool.person_record pr
WHERE mp.Master_Person_ID = pr.Master_Person_ID ) - 1;

This seems fairly simple, but the EXPLAIN from this query shows that some serious row scanning is going on:

# id | select_type          | table | partitions | type    | possible_keys | key                    | key_len | ref  | rows      | filtered | Extra
========================================================================================================================================================================
'1'  | 'UPDATE'             | 'mp'  | NULL       | 'index' | NULL          | 'PRIMARY'              | '4'     | NULL | '1198100' | '100.00' | NULL
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
'2'  | 'DEPENDENT SUBQUERY' | 'pr'  | NULL       | 'index' | NULL          | 'Master_Person_ID_IDX' | '17'    | NULL | '1200537' | '100.00' | 'Using where; Using index'

The important thing seems to be the rows column, which is 1198100 for the UPDATE and 1200537 for the SELECT subquery. Both of these numbers are pretty close to the total number of rows in both of the referenced tables (1207744 for both). So it seems to be doing a full row for row scan of both, and I can't see why. Precisely the same query worked fine in MySQL 5.5. I was hopeful that this solution would help, but passed 'derived_merge=off' to the optimizer_switch and restarting the server did not help.

I certainly don't expect this query to be super fast. It doesn't have to be. It wasn't exactly speedy before (a few minutes on a 7200rpm spinning disk), but since the upgrade to MySQL 5.7 it seems like it wouldn't complete anytime before the heat death of the universe, and I'd rather not wait that long. Does anyone out there have any ideas? Whether query rewrites, or my.ini settings or anything at all?

Also, please let me know if I have breached protocol in any way or if I can improve my question. As I said above, it is my first post here.

Thank you for your time.

EDIT: I thought for a moment that this solution looked promising. Apparently tables with differing charsets/collations can't properly read each others indexes. I was pretty sure everything was in latin1, but figured it was worth making sure. So I explicitly added DEFAULT CHARSET=latin1 to all of my CREATE TABLE statements and added CHARACTER SET latin1 to my LOAD DATA INFILE statements. Sadly, no change.

Community
  • 1
  • 1
convensive
  • 53
  • 5
  • Thank you to the kind person who formatted my EXPLAIN properly! – convensive Sep 14 '16 at 03:56
  • 1
    Q: Is \``Master_Person_ID`\` the leading column of \``Master_Person_ID_IDX`\` index ? We'd expect that subquery to be fairly quick if it is. That subquery needs to be fast, if we're going to be executing that a million times (once for each row in the outer query.) Other considerations would be InnoDB buffer pool size (if this is an InnoDB table). Is logging for this statement "statement" based or is it binary logging. There's also the issue of generating rollback, waiting to obtain row locks (contention with locks by other DML statements), etc. – spencer7593 Sep 14 '16 at 03:58
  • @spencer7593 The `Master_Person_ID_IDX` on the `liverpool.person_record` table is created with: `CREATE INDEX Master_Person_ID_IDX ON liverpool.person_record ( Master_Person_ID ASC );` So the `Master_Person_ID` is the leading and only column (if I am understanding you correctly). – convensive Sep 14 '16 at 04:02
  • InnoDB or MyISAM storage engine? – spencer7593 Sep 14 '16 at 04:04
  • @spencer7593 The InnoDB buffer pool size is 8G. And the database is built offline, so there's no lock contention because there's only me logged in as root (again, if I am understanding you correctly). As for the logging, I will go look into that now! – convensive Sep 14 '16 at 04:06
  • Can you please try it by using sub query as inner join it can improve the performance. – Shushil Bohara Sep 14 '16 at 04:08
  • @spencer7593 I would also note that the server setting are pretty much vanilla, my.ini wise. The only changes I made were raising `innodb_buffer_pool_size` to 8G and setting `innodb_flush_log_at_trx_commit` to 0. – convensive Sep 14 '16 at 04:15

1 Answers1

2

Try to rewrite query as:

UPDATE liverpool.master_person mp
  JOIN (SELECT Master_Person_ID, count(*) as cnt
          FROM liverpool.person_record
         GROUP BY Master_Person_ID)
       ) pr
    ON mp.Master_Person_ID = pr.Master_Person_ID
   SET mp.Link_Count = pr.cnt - 1
Mike
  • 1,985
  • 1
  • 8
  • 14
  • Hi Mike. This works like a charm! Thank you very much! If you're ever in Wellington, New Zealand, I will buy you a beer. Your query runs in about 101 seconds, which is more than fast enough for what I need it to do. I am curious about this syntax, however. What is it doing in the background? Is the optimizer temporarily materializing the SELECT into a temp table? – convensive Sep 15 '16 at 00:47
  • I'm going to mark this as the answer. But if anyone has any insight into why such a seemingly straightforward query as the one I posted above should be so weirdly slow in MySQL 5.7, I would LOVE to hear about it. – convensive Sep 15 '16 at 00:48
  • @convensive Your request takes one line "A" and for each looking in a "B". In addition using full table scan, apparently believing that the index slower. My query collects all amounts in a single pass of table (and maybe even only on the index). Then proceed to update. Does he collects a temporary table or store in memory - depend on the number of result records – Mike Sep 15 '16 at 06:44