2

I have been tasked with rewriting a slow query. I solved my performance problem, however I'm restless because I don't understand why one of the approaches I tried is faster than the other.

Query 1 (takes ~13 seconds on the website, ~0.2 seconds in PHPMYADMIN) :

SELECT t.USER_ID, COUNT(DISTINCT r1.id1) as count_matches 
FROM b_squad_member_result as t
INNER JOIN (SELECT m.IBLOCK_ELEMENT_ID as id1, (m.IBLOCK_ELEMENT_ID + 2) as id2, (m.IBLOCK_ELEMENT_ID + 4) as id3 
            FROM b_iblock_element_prop_s3 as m 
            WHERE m.PROPERTY_8 IS NULL) as r1
      ON t.MATCH_ID  IN(id1, id2, id3)
INNER JOIN b_uts_user as u ON u.VALUE_ID = t.USER_ID
      AND u.UF_ID_TEAM = 2228
GROUP BY t.USER_ID

Execution plan for Query 1

enter image description here

Query 2 (takes ~0.2 seconds on the website, ~0.2 seconds in PHPMYADMIN) :

SELECT t.USER_ID, COUNT(DISTINCT r1.id1) as count_matches 
FROM b_squad_member_result as t
INNER JOIN (SELECT m.IBLOCK_ELEMENT_ID as id1, (m.IBLOCK_ELEMENT_ID + 2) as id2, (m.IBLOCK_ELEMENT_ID + 4) as id3 
            FROM b_iblock_element_prop_s3 as m 
            WHERE m.PROPERTY_8 IS NULL) as r1
      ON t.MATCH_ID = id1 OR t.MATCH_ID = id2 OR t.MATCH_ID = id3
INNER JOIN b_uts_user as u ON u.VALUE_ID = t.USER_ID
      AND u.UF_ID_TEAM = 2228
GROUP BY t.USER_ID

Execution plan for Query 2 :

I first went with Query 1 as in PHPMYADMIN it was meeting my performance expectations. However, on the website itself, the query took much more time. After trying lots of different solutions I just decided to change the IN clause for t.MATCH_ID = id1 OR t.MATCH_ID = id2 OR t.MATCH_ID = id3 and this works as fast as it is supposed to. However I would like to understand why is the second approach faster. I've read that the IN clause is transformed into multiple OR clauses before the actual execution. Can it really affect performance that much?

hakre
  • 193,403
  • 52
  • 435
  • 836
  • 1
    Perhaps the second query is using parameters while with the IN clause in the first more or less _"some data"_ is in the IN clause. If the types do not match, Mysql must do type-conversions and can not optimize the query any longer against the column compared against incl. potential indexes (which brings speed). With the parameters it is less easy to fall into that problem as often the correct type is given with the parameters and type-conversion isn't necessary then and then the database server can optimize better. – hakre Jul 18 '21 at 08:41
  • 1
    You can also read that IN is faster than OR, e.g. here: https://stackoverflow.com/q/782915/367456 - You example shows however, that what you can read and what you can have does differ. Can you add the [execution plans](https://stackoverflow.com/q/915023/367456) of those two queries? – hakre Jul 18 '21 at 08:50
  • Just added the execution plans. So, I don't think it's a type conversion problem, because the types of data I manipulate are (Supposedly) the same. – Ilias Timaev Jul 18 '21 at 09:02
  • You run the query in Phpmyadmin against the same database as from your website, right? And which database driver and settings are you using on your website, and which ones are you using in Phpmyadmin? And how do you create the SQL in the website? Perhaps adding the code could shed some more light, too, but also if it's using a framework or library for a bit more context information. If you take a look at the execution plans, what is your interpretation of them in light of the speed difference you describe? – hakre Jul 18 '21 at 09:06
  • And yes, my first comment was a complete stabbing in the dark. More knowledge about how the SQL queries are build in the website and the database driver settings would be necessary to know to even come close to it, so perhaps I was a little eager ;) – hakre Jul 18 '21 at 09:10
  • And I think it needs to get execution plans for the website queries as well - not only for the queries from Phpmyadmin. – hakre Jul 18 '21 at 09:12

1 Answers1

1

The parentheses are different. Your first query puts the parens around r1 with the ON inside; it would be outside, as you did with the second query.

I see that the EXPLAINs are different; I don't know if this is because of the parens or IN vs OR.

Testing a value against multiple columns is really bad for performance. It is usually fixable by a schema change. I call the anti-pattern "spraying an array across columns". It is usually better to have another table with [up to] 3 rows for those ids. If the ids are strings, the FULLTEXT may be a better approach.

While the previous paragraph is valid is general, it does not apply in you case, since idn is computed from a single column IBLOCK_ELEMENT_ID. WTF?

I really need to see SHOW CREATE TABLE to fully help you.

These indexes, if you don't already have them, may help:

u:  (UF_ID_TEAM, VALUE_ID)
m:  (PROPERTY_8, IBLOCK_ELEMENT_ID)

COUNT(DISTINCT r1.id1) -- the DISTINCT (and its overhead) could probably be better done by adding a GROUP BY to the derived table for r1. Well, that may not be useful -- it depends on whether there are multiple rows of u involved. But then we get into whether you will stumble over ONLY_FULL_GROUP_BY. So, please explain whether the tables are 1:many or 1:1.

(I agree that this is an XY question.)

Rick James
  • 135,179
  • 13
  • 127
  • 222