0

I have 2 tables that I am trying to compare. The tables are the same structure and should be exactly the same with 60,000 entries each. However some entries are different and I would like to find the issues. I am currently trying to do a left join on each of them and show the results. I am currently getting results but it takes 12 min for each Query.

Table 1 (temp_entries)

ID | File

Table 2 (temp_dir_scan)

ID | File

Query 1 (12.25 min)

SELECT A.ID, A.File 
  FROM temp_entries A 
  LEFT JOIN temp_dir_scan B ON A.File = B.File 
 WHERE B.File is Null

Query 2 (12.26 min)

SELECT A.File 
  FROM temp_dir_scan A 
  LEFT JOIN temp_entries B ON A.File = B.File 
 WHERE B.File is Null

Query 3 (11.54 Min)

SELECT A.ID, A.File 
  FROM temp_entries A 
 Where A.File not in (select B.File from temp_dir_scan B)

What is the matter? Or what can I do to speed this up? What is a reasonable time to complete this in?

Here are examples of the results.

1|test.txt
2|test2.txt
45|temp.jpg
Sari Rahal
  • 1,897
  • 2
  • 32
  • 53
  • please show the result of EXPLAIN SELECT A.ID, A.File FROM temp_entries A LEFT JOIN temp_dir_scan B ON A.File = B.File WHERE B.File is Null , and also EXPLAIN SELECT A.File FROM temp_dir_scan A LEFT JOIN temp_entries B ON A.File = B.File WHERE B.File is Null , so we can see what happend – Bernd Buffen Jan 18 '16 at 21:11
  • 1
    You can create an index on feld `File` – HubertL Jan 18 '16 at 21:11
  • You may try the not in clause: SELECT A.ID, A.File FROM temp_entries A.Filenot in (select B.File from temp_dir_scan B) – bdn02 Jan 18 '16 at 21:12
  • `Null` is undefined and you are trying to do equate it so most likely is throwing it off and hence slows it down. If you want to compare the 2 tables and find differences in values for `File` column, another way would be to do a `FULL OUTER JOIN` and then in the outer query check if `A.File IS NULL OR B.File IS NULL` – vmachan Jan 18 '16 at 21:15
  • Indexes, Indexes, use EXPLAIN and avoid cartesian products. – digitai Jan 18 '16 at 21:15
  • @Sari Rahal - sorry, i mean put EXPLAIN before your query and execute it. and post the result – Bernd Buffen Jan 18 '16 at 21:16
  • @vmachan could you show an example? – Sari Rahal Jan 18 '16 at 21:21
  • @Sari Rahal - posted an answer as unable to format in comments – vmachan Jan 18 '16 at 21:30
  • @vmachan it results in an error "Duplicate column name 'ID' – Sari Rahal Jan 18 '16 at 21:31
  • execute your `SELECT ...` as `EXPLAIN SELECT...` and share the output you get. – trincot Jan 18 '16 at 21:31
  • And how do i tell them apart? – Sari Rahal Jan 18 '16 at 21:32
  • @Sari Rahal - updated to address the duplicate column error.. hope this helps.. – vmachan Jan 18 '16 at 21:38

1 Answers1

1

As mentioned in the comment, you could try and do a FULL OUTER JOIN, an example using your tables shown below (MySQL does not have FULL joins so using UNION to emulate this)

UPDATED: To remove duplicate column errors

SELECT C.A_ID
     , C.A_File 
     , C.B_ID
     , C.B_File
  FROM (
    SELECT A.ID AS A_ID
          ,A.File AS A_File
          ,B.ID AS B_ID
          ,B.File AS B_File
      FROM temp_entries A
      LEFT 
      JOIN temp_dir_scan B
        ON A.File = B.File
     UNION
    SELECT A.ID AS A_ID
          ,A.File AS A_File
          ,B.ID AS B_ID
          ,B.File AS B_File
      FROM temp_entries A
     RIGHT 
      JOIN temp_dir_scan B
        ON A.File = B.File
       ) C
WHERE C.A_File IS NULL 
   OR C.B_File IS NULL

This SO post gives more information about doing FULL OUTER JOINS in MySQL.

Hope this helps

NOTE: Unsure how much of a performance gain is seen compared to your original query(ies).

Community
  • 1
  • 1
vmachan
  • 1,672
  • 1
  • 10
  • 10