0

I realize this is a sizable query and there's obviously a lot going on. The 2nd to last line of this query compares MD5'd hashes of text fields. I tried this instead of comparing the values, but it didn't reduce the query execution time at all.

Fields in sob_datas can be linked to each other via sob_datas_to_links and this query compares the values for 2 forms (each have 517 rows in sob_datas) and pulls the linked ones that do not share the same values.

SELECT
                sob_datas.id,
                sob_datas.sob_field_name
            FROM sob_datas
            WHERE sob_form_id = '.$formId.' AND
            EXISTS(SELECT
                    sob_datas_to_links.id
                FROM sob_datas_to_links
                INNER JOIN sob_datas AS sub_sob_datas ON 
                    sub_sob_datas.id = sob_datas_to_links.sob_datas_id
                INNER JOIN sob_forms ON (
                    sob_forms.id = sub_sob_datas.sob_form_id AND
                    sob_forms.is_proof = 0 AND
                    sob_forms.archived IS NULL
                )
                WHERE sob_datas_to_links.link_id = (
                        SELECT
                            link_id
                        FROM sob_datas_to_links AS sub_sob_datas_to_links
                        WHERE sub_sob_datas_to_links.sob_datas_id = sob_datas.id
                    ) AND
                    sub_sob_datas.sob_field_name = sob_datas.sob_field_name AND
                    sub_sob_datas.hash != sob_datas.hash
            )

.

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   PRIMARY sob_datas   ref sob_form_id,sob_form_id_2   sob_form_id [BINARY - 1B]   const   563 Using where
2   DEPENDENT SUBQUERY  sob_forms   ALL PRIMARY NULL    NULL    NULL    937 Using where
2   DEPENDENT SUBQUERY  sub_sob_datas   ref PRIMARY,sob_form_id,sob_form_id_2   sob_form_id [BINARY - 3B]   summaries_dev.sob_forms.id,summaries_dev.sob_datas...   1   Using where
2   DEPENDENT SUBQUERY  sob_datas_to_links  ref sob_datas_id    sob_datas_id    [BINARY - 1B]   summaries_dev.sub_sob_datas.id  1   Using where
3   DEPENDENT SUBQUERY  sub_sob_datas_to_links  ref sob_datas_id    sob_datas_id    [BINARY - 1B]   summaries_dev.sob_datas.id  1   Using where
Ben
  • 60,438
  • 111
  • 314
  • 488
  • Could we have table definitions, too, please? – Wiseguy May 03 '11 at 20:02
  • Drop the subselects and use joins. http://www.codersrevolution.com/index.cfm/2008/7/31/MySQL-performance-INNER-JOIN-vs-subselect AND http://stackoverflow.com/questions/141278/subqueries-vs-joins – josh.trow May 03 '11 at 20:03
  • On an unrelated note, I hope `$formId` is protected against [SQL injection](http://en.wikipedia.org/wiki/SQL_injection) if applicable. – Wiseguy May 03 '11 at 20:04
  • I can't think of how to reduce the subqueries using joins – Ben May 03 '11 at 21:06
  • Then focus on reducing guesswork one has to put into answering your question, recreating tables based on your query is painful, while all you need to do is paste output of SHOW CREATE TABLE for tables used. – piotrm May 03 '11 at 22:37

1 Answers1

0

My head is about to explode (not totally related to this question, but database related) but I hope this will give you an idea of what I was thinking. Try a self join (but not this one, I'm SURE it won't work as I have it since I just can't put myself through trying to figure out your table schemas and all that jazz right now.

SELECT DISTINCT sob_datas.id, sob_datas.sob_field_name 
FROM sob_datas as A1 
  INNER JOIN sob_datas as A2 
    ON A1.sob_forms_ID = A2.sob_forms_ID
  JOIN sob_forms 
    ON A1.sob_forms_ID = sob_forms.id
    AND A2.sob_forms_ID = sob_forms.id
WHERE A1.hash != A2.hash 
  AND sob_forms.is_proof = 0
  AND sob_forms.archived IS NULL;
josh.trow
  • 4,861
  • 20
  • 31