1

I am comparing two tables to make sure they are same row by row and column by column on SQl server.

      SELECT  *
        FROM t1, t2
        WHERE t1.column1 = t2.column1 AND t1.column2 = t2.column2
      AND t1.column3 = t2.column3 AND t1.column4 != t2.column4

The tables are vey large, more than 100 million.

I got error:

    ERROR [HY000] ERROR:  9434 : Not enough memory for merge-style join

Are there better ways to do this comparison.

thanks !

user3448011
  • 1,469
  • 1
  • 17
  • 39

5 Answers5

1

A much efficient way of checking the row by row difference will be using Exists operator.

Something like this....

SELECT  *
FROM t1
WHERE NOT EXISTS (SELECT 1
                  FROM  t2 
                  WHERE t1.column1 = t2.column1 
                    AND t1.column2 = t2.column2
                    AND t1.column3 = t2.column3 
                    AND t1.column4 = t2.column4
                  )
M.Ali
  • 67,945
  • 13
  • 101
  • 127
1

You could try EXCEPT http://technet.microsoft.com/en-us/library/ms188055(v=sql.100).aspx

SELECT column1, column2, column3, column4 FROM t1
EXCEPT
SELECT column1, column2, column3, column4 FROM t2
Malk
  • 11,855
  • 4
  • 33
  • 32
0

What if you try an INNER JOIN (and not select all the data from both tables)?

SELECT  t1.column4, t2.column4
FROM t1 INNER JOIN t2 ON t1.column1 = t2.column1 AND t1.column2 = t2.column2
  AND t1.column3 = t2.column3
WHERE t1.column4 != t2.column4

Do you want to identify all the rows that are different or just identify IF there are any rows that are different?

Dave Jemison
  • 694
  • 3
  • 7
0

Here's how I would do this: first, I assume you have primary keys on both tables. When you join those tables, the best way to join is using primary key fields, not all of them:

select t1.*, t2.*
from t1 join t2 on t1.id = t2.id

then you can compare those tables field-by-field without overloading sql:

select t1.*, t2.*
from t1 outer join t2 on t1.id = t2.id
where t1.field1 <> t2.field1 ot t1.field2 <> t2.field2 .....

the resulting records would be mismatches.

the code I wrote here is conceptual, I personally didn't run it on sql, so you might need to adjust

galets
  • 17,802
  • 19
  • 72
  • 101
0

All of the above are good suggestions (My first try would be SELECT * FROM t1 EXCEPT SELECT * FROM t2), but you indicate they all give the same out of memory error. Therefore I must conclude your tables are simply too large to perform the operation you desire all in one go. You'll have to run the query in stages, using a technique like one of the ones from "Equivalent of LIMIT and OFFSET for SQL Server?" I'd start with something like this (SQL Fiddle):

DECLARE @offset INT = 0
SELECT TOP 50000000 * 
  FROM (
        SELECT *, 
               ROW_NUMBER() over (order by column1) AS r_n_n 
          FROM t1
        ) xx 
  WHERE r_n_n >= @offset
EXCEPT
 SELECT TOP 50000000 * 
   FROM (
         SELECT *, 
                ROW_NUMBER() over (order by column1) AS r_n_n 
           FROM t2
        ) xx 
  WHERE r_n_n >= @offset

Then you can increment @offset by the amount of TOP n and do it again. This will likely involve some trial and error to find the limit for the TOP n clause that will run to completion rather than throw an error. I'd start with half, then try quarters, eighths, etc. as necessary.

Community
  • 1
  • 1
ob1quixote
  • 399
  • 1
  • 3
  • 8
  • do I need to sort the tables before I do the comparison ? Otherwise, how ot make sure the top 5000000 rows should be compared ? thanks ! – user3448011 May 09 '14 at 12:00
  • The result set is ordered by whatever you specify in the [`ORDER BY`](http://technet.microsoft.com/en-us/library/ms188385(v=sql.100).aspx) argument in the [`OVER`](http://technet.microsoft.com/en-us/library/ms189461(v=sql.100).aspx) clause of `ROW_NUMBER()` as a side effect. As long as it's the same for both tables, it should be fine. – ob1quixote May 09 '14 at 17:08