0

I have a drupal 7 site running on MySQL. Some pages on the site are excruciatingly slow to load.

I investigated load times and have identified the culprit query, which is regularly taking 10s to execute on some pages. In one case it even took 70s!

The query is from a "view" that generates a short list of related content from elsewhere in the site based on the site taxonomy.

This is an example (with arguments) from one slow page:

SELECT node.nid AS nid, node.title AS node_title, node.created AS node_created, 'podcasts:panel_pane_3' AS view_name, RAND() AS random_field 
FROM node node 
LEFT JOIN (SELECT td.*, tn.nid AS nid 
    FROM taxonomy_term_data td 
    LEFT JOIN taxonomy_vocabulary tv ON td.vid = tv.vid 
    LEFT JOIN taxonomy_index tn ON tn.tid = td.tid 
    WHERE (tv.machine_name IN ('listen')) ) taxonomy_term_data_node 
ON node.nid = taxonomy_term_data_node.nid 
LEFT JOIN taxonomy_index taxonomy_index ON node.nid = taxonomy_index.nid 
WHERE (( (taxonomy_index.tid IN ('472', '350', '742', '681', '3907', '1541', '411', '636', '990', '7757', '680', '743', '11479', '8106', '566', '2230', '11480', '766')) 
  AND (node.nid != '191314' OR node.nid IS NULL) ) 
  AND(( (node.status = '1') 
  AND (node.type IN ('article', 'experiment', 'interview', 'podcast', 'question')) ))) 
ORDER BY random_field ASC, node_created DESC 
LIMIT 5 OFFSET 0

From initial research I thought it would be a case of adding indices, but the columns of the tables concerned seem to have existing index entries.

I'm therefore uncertain how to proceed and would really value some guidance if anyone can help me please?

PS - I did ask MySQL to Explain itself and this is what was generated: mysql explain output

Chris
  • 23
  • 7
  • 2
    google `explain statement` – Iłya Bursov Jan 25 '18 at 21:27
  • Just prefix the statement with "EXPLAIN ... ". MySQL will not execute the query then but print a table of information about the various aspects of the execution plan for the query. To understand these, you will need to read some [documentation](https://dev.mysql.com/doc/refman/5.7/en/using-explain.html). – marekful Jan 25 '18 at 21:38
  • Thank you both; yes, i had done that already and I apologise for not including the output, which is now added above. – Chris Jan 25 '18 at 21:42
  • Optimizing SQL queries is an art :) be patient and willing to read and work a lot. – LMC Jan 25 '18 at 22:23
  • Thank you; I'm prepared to do both, but I'm not sure where to start... – Chris Jan 25 '18 at 22:55
  • here's an example: https://stackoverflow.com/questions/48453438/how-to-improve-query-performance-using-explain-command-results-f-e/48453705#48453705 – LMC Jan 26 '18 at 23:04
  • Hint: the row count on td table is high compared to the others, that could mean missing indexes on the involved columns – LMC Jan 26 '18 at 23:06
  • Add a result sample showing what you expect from your ORDER BY clause. – LMC Jan 26 '18 at 23:08
  • Am I missing something or is the whole content of the subquery (taxonomy_term_data_node) not used at all? – Simon Berthiaume Feb 17 '18 at 16:10
  • I also notice many `LEFT JOIN` that probably shouldn't be; for example you have the WHERE condition `tv.machine_name IN ('listen')` so your `LEFT JOIN taxonomy_vocabulary tv` doesn't make sense since, if there is not matching record, your WHERE condition will be false and filter-out the record anyway. Same thing happens with `LEFT JOIN taxonomy_index` since you have `taxonomy_index.tid IN (...)` in your WHERE clause. – Simon Berthiaume Feb 17 '18 at 16:20
  • God knows; it's generated by a view in Drupal; whether it works well or not actually seems to be a side-point because I think there is something more fundamentally wrong with my MySQL config - https://stackoverflow.com/questions/48842794/why-does-alter-table-report-no-records-on-mysql – Chris Feb 17 '18 at 16:33

2 Answers2

0

Few recommendations to optimize this query:

  1. Avoid selecting unnecessary columns: do you really need all the columns in td.*? In most cases it means that too much information is passed over the network to the application.
  2. Mixed ORDER BY directions: you're sorting by two columns: random_field ASC, node_created DESC. Sorting by different orders will prevent index usage, which will slow down the search. Do you think it makes sense to make both ASC or both DESC?
  3. I assume that taxonomy_index.tid is numeric, and so are 'node.nid' and 'node.status'. In that case, when comparing them to constants, do not add quotes around the constant, as it will cause an unrequired cast which might prevent index use. For example, turn node.status = '1' to node.status = 1.
  4. You are left joining a subquery (taxonomy_term_data_node) - if you're using MySQL < 5.6, or maybe even MySQL 5.7, it's most likely that MySQL can't index that subquery properly. Therefore, I would recommend to extract that subquery to a temporary table, index it and join to it from the outer query. See the transformation below.

So to apply most changes (the ones that do not require your decision, such as part 1 and 2 above), perform these steps:

First, index the main query by adding these indexes:

ALTER TABLE
  `node`
ADD
  INDEX `node_idx_status_nid_title_created` (`status`, `nid`, `title`, `created`);

ALTER TABLE
  `taxonomy_index`
ADD
  INDEX `taxonomy_index_idx_nid` (`nid`);

ALTER TABLE
  `taxonomy_index`
ADD
  INDEX `taxonomy_index_idx_tid_nid` (`tid`, `nid`);

ALTER TABLE
  `taxonomy_term_data`
ADD
  INDEX `taxonomy_term_data_idx_vid_tid` (`vid`, `tid`);

ALTER TABLE
  `taxonomy_vocabulary`
ADD
  INDEX `taxonomy_vocabulary_idx_vid` (`vid`);

First, create the temporary table:

CREATE TEMPORARY TABLE IF NOT EXISTS temp1 AS SELECT
        taxonomy_term_data.*,
        tn.nid AS nid 
    FROM
        taxonomy_term_data td 
    LEFT JOIN
        taxonomy_vocabulary tv 
            ON td.vid = tv.vid 
    LEFT JOIN
        taxonomy_index tn 
            ON tn.tid = td.tid 
    WHERE
        (
            tv.machine_name IN (
                'listen'
            )
        );

Now index the subquery: ALTER TABLE temp1 ADD INDEX temp1_idx_nid (nid);

And the outer query will join to it:

SELECT
        node.nid AS nid,
        node.title AS node_title,
        node.created AS node_created,
        'podcasts:panel_pane_3' AS view_name,
        RAND() AS random_field 
    FROM
        node node 
    LEFT JOIN
        temp1 taxonomy_term_data_node 
            ON node.nid = taxonomy_term_data_node.nid 
    LEFT JOIN
        taxonomy_index taxonomy_index 
            ON node.nid = taxonomy_index.nid 
    WHERE
        (
            (
                (
                    taxonomy_index.tid IN (
                        '472', '350', '742', '681', '3907', '1541', '411', '636', '990', '7757', '680', '743', '11479', '8106', '566', '2230', '11480', '766'
                    )
                ) 
                AND (
                    node.nid != '191314' 
                    OR node.nid IS NULL
                )
            ) 
            AND (
                (
                    (
                        node.status = '1'
                    ) 
                    AND (
                        node.type IN (
                            'article', 'experiment', 'interview', 'podcast', 'question'
                        )
                    )
                )
            )
        ) 
    ORDER BY
        random_field ASC,
        node_created DESC LIMIT 5
Tomer Shay
  • 771
  • 6
  • 17
  • Many thanks - I'm on the case and wanted to acknowledge your kind contribution. I'll report back as I deal with this. Very grateful indeed. – Chris Feb 01 '18 at 16:20
  • @Chris, sure, hope this was helpful. Will appreciate if you could report back on this, and if it was helpful, please upvote and accept the answer. – Tomer Shay Feb 01 '18 at 16:45
  • Thanks @Tomer - it's not trivial to instantly implement your solution because mysql objects to the temp table (it threw an error) because I need to specify variable characteristics for the table columns; I need to look into what the data are so I know what to put there... – Chris Feb 02 '18 at 14:48
  • @Chris, I fully understand, take your time. Looking forward to hearing back once you implement the changes. Also, another thought - you could try out the solution with example values, just to see if it improved the performance, before implementing everything in the code of the application itself. – Tomer Shay Feb 02 '18 at 22:01
  • Thanks; yes, I had in mind using the query above to benchmark and demonstrate improvement before implementing it; but perhaps you can advise me - mysql objects because it wants me to specify the data types for the columns in the temp table. This is not included in your example above. What is the correct syntax for that? – Chris Feb 03 '18 at 11:13
  • @Chris, is it a request from MySQL server when creating the temp table? If so, can you copy the warning / error message so I could try to assist? – Tomer Shay Feb 03 '18 at 18:14
  • This is what it returns when I enter the create temp table sequence you have posted in your second set of instructions above. 5 errors were found during analysis. An opening bracket was expected. (near "AS" at position 43) At least one column definition was expected. (near " " at position 42) Unexpected beginning of statement. (near "tv" at position 342) Unexpected beginning of statement. (near "machine_name" at position 345) Unrecognized statement type. (near "IN" at position 358) SQL query: Documentation #1051 - Unknown table 'taxonomy_term_data' – Chris Feb 04 '18 at 23:23
  • Which database type and version are you using? Why does it say `Unknown table 'taxonomy_term_data' `? Are you sure you are running the query on a database that have this table? – Tomer Shay Feb 05 '18 at 08:29
  • yes, absolutely. It's bizarre. It's on MySQL 5.7; I can see that very table in front of me, and those queries are running off drupal against those tables all the time, so why your query seems to think the table is non-existent, I don't know. I must be making a silly oversight... – Chris Feb 05 '18 at 23:45
  • Can you try creating this simple test table creation statement please? `CREATE TEMPORARY TABLE IF NOT EXISTS table2 AS (SELECT * FROM taxonomy_term_data LIMIT 10)`. – Tomer Shay Feb 06 '18 at 06:49
  • `ORDER BY RAND(), ...` is not optimizable -- there _will_ be an extra sort pass. – Rick James Feb 10 '18 at 21:54
0

Thanks for the guidance above, everyone. However, I have solved this with the help of Andy Batey at Cambridge University.

The clue was comparing the EXPLAIN statements generated when the query above was run on MySQL v5.5 (very fast results) versus v5.7 (very slow results); they query was being handled quite differently on the two platforms.

The key was adding this to my.cnf:

optimizer_switch='derived_merge=off'

Now the native query executes in 50ms or less, compared with 12s or longer before.

I hope this helps anyone else who runs into this upgrade problem.

Chris
  • 23
  • 7