2

I have the following query which takes very long to execute (around 30 seconds). Is there any way to optimize this query or should I let it be?

SELECT l.Name, l.Model, l.Engine_Type, m.ID, g.Name  
FROM db.vehicle_part k
JOIN db.vehicle l ON k.ID_Vehicle = l.ID
JOIN db.part m ON k.ID_Part = m.ID

JOIN db.part_generic_part b ON b.ID_Part = m.ID
JOIN db.generic_part c ON c.ID = b.ID_Generic_part
JOIN db.generic_part_name d ON c.ID = d.ID_Generic_Part 
JOIN db.part_category_generic_part e ON c.ID = e.ID_generic_part
JOIN db.part_category f ON e.ID_category = f.ID
JOIN db.part_category_name g ON f.ID = g.ID_Part_Category 
JOIN db.lang h ON g.ID_Lang = h.ID

WHERE l.Name = 'BMW' AND l.Model = '3 (E30)' AND l.Engine_Type= '316 i' AND g.ID_Lang =2
imvas
  • 23
  • 2
  • Maybe you can start with indexes. – Nitish Mar 29 '16 at 12:43
  • This is NOT a description of a problem, just some numbers and a SQL query. What is the schema of the database (the structure of the tables - including the indexes - and the connections between them), What is the estimated execution plan, what are the record counts? Optimizing a query is not just applying some best practices, but it requires to know the data you are working with. We know nothing about your data. Do you know your data? Can you teach us? – Pred Mar 29 '16 at 12:43

2 Answers2

2

You should have indexes on all the columns used for the joins. More importantly, you should have an index on db.vehicle:

db_vehicle(name, model, engine_type, id)

This should be used for the where clause and speed up the query.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

As @Gordon Linoff already pointed out, you need to define indexes. However, there are some things to do further with your query. It is certainly better to define well the order of your tables, so we change this

SELECT l.Name, l.Model, l.Engine_Type, m.ID, g.Name  
FROM db.vehicle_part k
JOIN db.vehicle l ON k.ID_Vehicle = l.ID
JOIN db.part m ON k.ID_Part = m.ID

JOIN db.part_generic_part b ON b.ID_Part = m.ID
JOIN db.generic_part c ON c.ID = b.ID_Generic_part
JOIN db.generic_part_name d ON c.ID = d.ID_Generic_Part 
JOIN db.part_category_generic_part e ON c.ID = e.ID_generic_part
JOIN db.part_category f ON e.ID_category = f.ID
JOIN db.part_category_name g ON f.ID = g.ID_Part_Category 
JOIN db.lang h ON g.ID_Lang = h.ID

WHERE l.Name = 'BMW' AND l.Model = '3 (E30)' AND l.Engine_Type= '316 i' AND g.ID_Lang =2

to this

SELECT l.Name, l.Model, l.Engine_Type, m.ID, g.Name  
FROM db.vehicle_part k
JOIN db.vehicle l ON k.ID_Vehicle = l.ID
JOIN db.part m ON k.ID_Part = m.ID

JOIN db.part_generic_part b ON b.ID_Part = m.ID
JOIN db.generic_part c ON c.ID = b.ID_Generic_part
JOIN db.part_category_generic_part e ON c.ID = e.ID_generic_part
JOIN db.part_category f ON e.ID_category = f.ID
JOIN db.part_category_name g ON f.ID = g.ID_Part_Category 
JOIN db.lang h ON g.ID_Lang = h.ID
JOIN db.generic_part_name d ON c.ID = d.ID_Generic_Part 

WHERE l.Name = 'BMW' AND l.Model = '3 (E30)' AND l.Engine_Type= '316 i' AND g.ID_Lang =2

This is a more elegant code, even though there is no guarantee that it will actually be quicker than the initial one. However, with this better order, we go to the next step. The where clause is executed AFTER the join is completed, so you have a large, multi-dimensional set of records loaded into memory and you filter the rows out. The on condition executes earlier, so the next step is to modify the where and migrate its condition into on conditions, like this:

SELECT l.Name, l.Model, l.Engine_Type, m.ID, g.Name  
FROM db.vehicle_part k
JOIN db.vehicle l ON k.ID_Vehicle = l.ID AND l.Name = 'BMW' AND l.Model = '3 (E30)' AND l.Engine_Type= '316 i'
JOIN db.part m ON k.ID_Part = m.ID

JOIN db.part_generic_part b ON b.ID_Part = m.ID
JOIN db.generic_part c ON c.ID = b.ID_Generic_part
JOIN db.part_category_generic_part e ON c.ID = e.ID_generic_part
JOIN db.part_category f ON e.ID_category = f.ID
JOIN db.part_category_name g ON f.ID = g.ID_Part_Category and g.ID_Lang =2
JOIN db.lang h ON g.ID_Lang = h.ID
JOIN db.generic_part_name d ON c.ID = d.ID_Generic_Part 

This way, unneeded records are filtered out while joining, reducing time, since they are not all loaded into memory and then filtered. The final step should be done if and only if the query is still slow, since it reduces the readability of the code: you can modify the joins having a severe filter (reducing drastically the number of rows) or a severe projection (having many large fields, but needed only a small subset of that), to named subselects. Example:

SELECT 'BMW', '3 (E30)', '316 i', m.ID, g.Name  
from (select temp.ID from db.vehicle temp where temp.Name = 'BMW' AND temp.Model = '3 (E30)' AND temp.Engine_Type= '316 i') l
JOIN db.vehicle_part k on l.ID = k.ID_VEHICLE
JOIN db.part m ON k.ID_Part = m.ID

JOIN db.part_generic_part b ON b.ID_Part = m.ID
JOIN db.generic_part c ON c.ID = b.ID_Generic_part
JOIN db.part_category_generic_part e ON c.ID = e.ID_generic_part
JOIN db.part_category f ON e.ID_category = f.ID
JOIN db.part_category_name g ON f.ID = g.ID_Part_Category and g.ID_Lang =2
JOIN db.lang h ON g.ID_Lang = h.ID
JOIN db.generic_part_name d ON c.ID = d.ID_Generic_Part 

This final modification prefilters l to make sure there are only a handful of records to start the join with.

Lajos Arpad
  • 64,414
  • 37
  • 100
  • 175
  • This sounds like a lot of waffle. Any data to support any of the statements made here? – Strawberry Mar 29 '16 at 13:40
  • Some of the changes might have some impact on the execution plan, but it highly depends on the distribution of the data, the indexes, the structure of the indexes, etc. Again, without knowing the facts about the database, this is just guessing. I am not even sure if all the tables are necessary (again, the truth of this statement depends on the keys and column properties what we don't know). – Pred Mar 29 '16 at 16:20
  • Surely, on is a where executed exactly when the records are being joined. Where assumes that from (along with the joins) already executed. There are several sources in several RDBMSs (http://stackoverflow.com/questions/10133356/where-clause-before-inner-join and https://msdn.microsoft.com/en-us/library/ms189499.aspx) about it and my experiments confirmed this as well. The subselect actually forces mysql to filter rows from a table before they are being joined and each filtered record will be considered only once – Lajos Arpad Mar 29 '16 at 17:11
  • 1
    ok, then why do you use subselect for the `vehicle` table and just an extra constant comparison for the `part_category_name` table (`g.ID_lang=2`)? Why did you choose different approaches for a similar problem? Why do you return all the columns from the vehicle table when the result does not contain any columns other than the ones used in the where clause? Is that table even necessary? Is it possible, that an exists or a query executed right before the current one may be faster? I don't know, because we know almost nothing about the database and the use case. – Pred Mar 29 '16 at 18:32
  • The two approaches were part of an example. As I stated, this approach makes the code more difficult to read, so it should not be applied blindly, just when it clearly helps. Your second question is valid, we can use projection to optimize memory size (upvoted your comment). That table might be unnecessary, in fact if you work with a subselect for k, using exists. In my answer I illustrated a technique with it. It is possible, but not guaranteed. A subquery will still run the exists n times. A query before it should increase speed, but that will be no longer a query then. – Lajos Arpad Mar 29 '16 at 19:22
  • @Pred, most of your questions are perfectly valid. There are a lot of optimization techniques. I have shown a few, but know a lot more than shown. Wanted to keep the answer digestible. So, while you have some very good points, I still consider my answer to be valid. If you do not agree, you can show other arguments, or you can downvote the answer. Basically any of those would show your opinion, which is always welcome in such an open discussion. Cheers :) – Lajos Arpad Mar 29 '16 at 19:24