1

there are so many questions here on this subject (see here or here for examples), but I cannot figure out how to correctly run my particular case.

Here is the SQLFiddle that contains table schema, records, and queries. At this moment I have a query that works, but it is very inefficient because it uses a dependent subquery:

SELECT 
      id_document, version, id_plant_mov, id_production_type, is_ext, 
      id_cost_center, id_import_kpi_code, id_plant_tag, value 
FROM  document_production_history2 
WHERE id_document = 751 
  AND id_production_type IN (1, 3) 
  AND is_group_production = 0 
  AND (id_document, id_plant_mov, id_production_type, id_cost_center, 
       id_import_kpi_code, id_plant_tag, is_ext, version) 
      IN (
        SELECT id_document, id_plant_mov, id_production_type, 
               id_cost_center, id_import_kpi_code, id_plant_tag, is_ext, 
               MAX(version) 
        FROM document_production_history2 
        GROUP BY id_document, id_plant_mov, id_production_type, 
            id_cost_center, id_import_kpi_code, id_plant_tag, is_ext);

I tried to rewrite the above query like this:

SELECT d.id_document id_doc, d.version, d.id_plant_mov, 
    d.id_production_type id_prod_type, d.is_ext, d.id_cost_center, 
    d.id_import_kpi_code kpi_code, d.id_plant_tag, d.value 
FROM document_production_history2 d 
JOIN (
  SELECT id_document, id_plant_mov, id_production_type, is_ext, 
      id_cost_center, id_import_kpi_code, id_plant_tag, 
      is_group_production, MAX(version) version 
  FROM document_production_history2 
  WHERE id_document = 751 
  AND id_production_type IN (1, 3) 
  AND is_group_production = 0 
  GROUP BY id_document, id_plant_mov, id_production_type, 
      id_cost_center, id_import_kpi_code, id_plant_tag, 
      is_group_production
) m 
ON d.version = m.version 
AND d.id_document = m.id_document 
AND d.id_production_type = m.id_production_type 
AND d.id_plant_mov = m.id_plant_mov 
AND d.id_plant_tag = m.id_plant_tag 
AND d.id_cost_center = m.id_cost_center 
AND d.id_import_kpi_code = m.id_import_kpi_code 
AND d.is_ext = m.is_ext 
AND d.is_group_production = m.is_group_production;

but it returns 27 rows instead of the 10 rows expected.

Thanks in advance.

Community
  • 1
  • 1
ramo102
  • 505
  • 3
  • 9
  • 23
  • Your subquery has too many fields. – Dan Bracuk Jul 29 '14 at 16:12
  • The first query doesn't have `is_group_production` in the subquery, the second version does. That's causing you to get more groups. – Barmar Jul 29 '14 at 16:14
  • Actually, that shouldn't make a difference because you have `is_group_production = 0` in the `WHERE` clause. – Barmar Jul 29 '14 at 16:16
  • It apparently is related to that column. When I add it to the dependent subquery in the first query, I get 27 rows as well. – Barmar Jul 29 '14 at 16:26
  • I've figured it out. The first query is getting `MAX(version)` without regard for `is_group_production`. But you're joining only with rows that have `is_group_production = 0`. In 17 of these cases, the max version is from a row where `is_group_product = 1`, so it doesn't match. – Barmar Jul 29 '14 at 16:43

4 Answers4

1

A derived table is the way to go here. Your problem in the second example was that you were getting max(version) for every row returned, including differences in is_group_production - this is where the extra rows crept in. So the where clause needs to stay in the outer query for that reason. Theoretically you could move the other two parts of the where clause to the inner query, but I find that quite unreadable and unintuitive.

This returns 10 rows:

SELECT 
      d.id_document, d.version, d.id_plant_mov, d.id_production_type, d.is_ext, 
      d.id_cost_center, d.id_import_kpi_code, d.id_plant_tag, d.value 
FROM  
      document_production_history2  d JOIN

(SELECT id_document, id_plant_mov, id_production_type, 
       id_cost_center, id_import_kpi_code, id_plant_tag, is_ext, 
       MAX(VERSION) AS maxversion
 FROM document_production_history2 
 GROUP BY id_document, id_plant_mov, id_production_type, 
    id_cost_center, id_import_kpi_code, id_plant_tag, is_ext) m ON

  d.version = m.maxversion and 
  d.id_document = m.id_document and
  d.id_production_type = m.id_production_type  and
  d.id_plant_mov = m.id_plant_mov  and
  d.id_plant_tag = m.id_plant_tag  and
  d.id_cost_center = m.id_cost_center and 
  d.id_import_kpi_code = m.id_import_kpi_code and
  d.is_ext = m.is_ext 

WHERE 
  d.id_document = 751  and
  d.id_production_type IN (1, 3)  and
  d.is_group_production = 0 

And as far as I can see it performs pretty well. Derived tables aren't ideal solutions, but they're an order of magnitude better that sub-queries, as mysql doesn't have to execute them for every row of the result-set.

Simon Woolf
  • 603
  • 4
  • 12
  • I think he's hoping to get more efficiency by doing the filtering in the subquery, so it won't have to do as much aggregation. – Barmar Jul 29 '14 at 16:27
  • I doubt thats going to make much difference, and its not possible to filter in the subquery, because id_production_type needs to be outside the group statement. – Simon Woolf Jul 29 '14 at 16:32
  • Btw another tactic that is ever so slightly better than the derived table would be to create a view with the contents from the derived table above. – Simon Woolf Jul 29 '14 at 16:34
  • Simon Woolf, your solution works well, thanks. I preferred the @Barmar's solution because "explain [query]" shows slightly better results (less rows searched both in primary query and in derived one). – ramo102 Jul 29 '14 at 20:59
1

Here's a JOIN that's equivalent to your first query. You need to take is_group_production = 0 out of the subquery, and only do it in the main query. This filters out rows where the max version is a group production.

SELECT d.id_document id_doc, d.version, d.id_plant_mov, 
    d.id_production_type id_prod_type, d.is_ext, d.id_cost_center, 
    d.id_import_kpi_code kpi_code, d.id_plant_tag, d.value 
FROM document_production_history2 d 
JOIN (
  SELECT id_document, id_plant_mov, id_production_type, is_ext, 
      id_cost_center, id_import_kpi_code, id_plant_tag, 
      MAX(version) version 
  FROM document_production_history2 
  WHERE id_document = 751 
  AND id_production_type IN (1, 3) 
  GROUP BY id_document, id_plant_mov, id_production_type, 
      id_cost_center, id_import_kpi_code, id_plant_tag

) m 
ON d.version = m.version 
AND d.id_document = m.id_document 
AND d.id_production_type = m.id_production_type 
AND d.id_plant_mov = m.id_plant_mov 
AND d.id_plant_tag = m.id_plant_tag 
AND d.id_cost_center = m.id_cost_center 
AND d.id_import_kpi_code = m.id_import_kpi_code 
AND d.is_ext = m.is_ext 
WHERE d.is_group_production = 0;

DEMO

Barmar
  • 741,623
  • 53
  • 500
  • 612
0

This:

JOIN (
    SELECT id_document, id_plant_mov, id_production_type, is_ext, 
      id_cost_center, id_import_kpi_code, id_plant_tag, 
      is_group_production, MAX(version) version 
    FROM document_production_history2 
    WHERE id_document = 751 
    AND id_production_type IN (1, 3) 
    AND is_group_production = 0 
    GROUP BY id_document, id_plant_mov, id_production_type, 
      id_cost_center, id_import_kpi_code, id_plant_tag, 
      is_group_production
) m 

should be this:

JOIN (
    SELECT id_document, MAX(version) version 
    FROM document_production_history2 
    WHERE id_document = 751 
    AND id_production_type IN (1, 3) 
    AND is_group_production = 0 
    GROUP BY id_document
) m 

When joining to m, use id_document and version only.

Barmar
  • 741,623
  • 53
  • 500
  • 612
Dan Bracuk
  • 20,699
  • 4
  • 26
  • 43
-2

Why not remove the subquery altogether?

SELECT 
id_document, version, id_plant_mov, id_production_type, is_ext, 
id_cost_center, id_import_kpi_code, id_plant_tag, value, MAX(version) 
FROM  document_production_history2 
WHERE id_document = 751 
AND id_production_type IN (1, 3) 
AND is_group_production = 0
GROUP BY id_document, id_plant_mov, id_production_type, id_cost_center, id_import_kpi_code, id_plant_tag, is_ext
mungea05
  • 110
  • 6