comparing two tables to make sure they are same row by row and column by column on SQl server

Question

I am comparing two tables to make sure they are same row by row and column by column on SQl server.

      SELECT  *
        FROM t1, t2
        WHERE t1.column1 = t2.column1 AND t1.column2 = t2.column2
      AND t1.column3 = t2.column3 AND t1.column4 != t2.column4

The tables are vey large, more than 100 million.

I got error:

    ERROR [HY000] ERROR:  9434 : Not enough memory for merge-style join

Are there better ways to do this comparison.

thanks !

just out of curiosity how much memory do you have on your server? — M.Ali, May 08 '14 at 23:42

score 1 · Answer 1 · answered May 08 '14 at 23:34

1

A much efficient way of checking the row by row difference will be using Exists operator.

Something like this....

SELECT  *
FROM t1
WHERE NOT EXISTS (SELECT 1
                  FROM  t2 
                  WHERE t1.column1 = t2.column1 
                    AND t1.column2 = t2.column2
                    AND t1.column3 = t2.column3 
                    AND t1.column4 = t2.column4
                  )

answered May 08 '14 at 23:34

M.Ali

67,945
13
101
127

same error: ERROR [HY000] ERROR: 9434 : Not enough memory for merge-style join, thanks ! – user3448011 May 08 '14 at 23:48

score 1 · Answer 2 · answered May 08 '14 at 23:34

1

You could try EXCEPT http://technet.microsoft.com/en-us/library/ms188055(v=sql.100).aspx

SELECT column1, column2, column3, column4 FROM t1
EXCEPT
SELECT column1, column2, column3, column4 FROM t2

answered May 08 '14 at 23:34

Malk

11,855
4
33
32

same error: ERROR [HY000] ERROR: 9434 : Not enough memory for merge-style join, thanks ! – user3448011 May 09 '14 at 01:48

score 0 · Answer 3 · answered May 08 '14 at 23:34

0

What if you try an INNER JOIN (and not select all the data from both tables)?

SELECT  t1.column4, t2.column4
FROM t1 INNER JOIN t2 ON t1.column1 = t2.column1 AND t1.column2 = t2.column2
  AND t1.column3 = t2.column3
WHERE t1.column4 != t2.column4

Do you want to identify all the rows that are different or just identify IF there are any rows that are different?

answered May 08 '14 at 23:34

Dave Jemison

694
3
7

same error: ERROR [HY000] ERROR: 9434 : Not enough memory for merge-style join, thanks ! – user3448011 May 08 '14 at 23:54
check IF there are any rows that are different – user3448011 May 08 '14 at 23:55

score 0 · Answer 4 · answered May 08 '14 at 23:35

Here's how I would do this: first, I assume you have primary keys on both tables. When you join those tables, the best way to join is using primary key fields, not all of them:

select t1.*, t2.*
from t1 join t2 on t1.id = t2.id

then you can compare those tables field-by-field without overloading sql:

select t1.*, t2.*
from t1 outer join t2 on t1.id = t2.id
where t1.field1 <> t2.field1 ot t1.field2 <> t2.field2 .....

the resulting records would be mismatches.

the code I wrote here is conceptual, I personally didn't run it on sql, so you might need to adjust

"outer join" will make it run out of memory. thanks ! – user3448011 May 09 '14 at 01:50 — user3448011, May 09 '14 at 01:50

score 0 · Answer 5 · edited May 23 '17 at 11:57

All of the above are good suggestions (My first try would be SELECT * FROM t1 EXCEPT SELECT * FROM t2), but you indicate they all give the same out of memory error. Therefore I must conclude your tables are simply too large to perform the operation you desire all in one go. You'll have to run the query in stages, using a technique like one of the ones from "Equivalent of LIMIT and OFFSET for SQL Server?" I'd start with something like this (SQL Fiddle):

DECLARE @offset INT = 0
SELECT TOP 50000000 * 
  FROM (
        SELECT *, 
               ROW_NUMBER() over (order by column1) AS r_n_n 
          FROM t1
        ) xx 
  WHERE r_n_n >= @offset
EXCEPT
 SELECT TOP 50000000 * 
   FROM (
         SELECT *, 
                ROW_NUMBER() over (order by column1) AS r_n_n 
           FROM t2
        ) xx 
  WHERE r_n_n >= @offset

Then you can increment @offset by the amount of TOP n and do it again. This will likely involve some trial and error to find the limit for the TOP n clause that will run to completion rather than throw an error. I'd start with half, then try quarters, eighths, etc. as necessary.

do I need to sort the tables before I do the comparison ? Otherwise, how ot make sure the top 5000000 rows should be compared ? thanks ! — user3448011, May 09 '14 at 12:00
The result set is ordered by whatever you specify in the [`ORDER BY`](http://technet.microsoft.com/en-us/library/ms188385(v=sql.100).aspx) argument in the [`OVER`](http://technet.microsoft.com/en-us/library/ms189461(v=sql.100).aspx) clause of `ROW_NUMBER()` as a side effect. As long as it's the same for both tables, it should be fine. — ob1quixote, May 09 '14 at 17:08

comparing two tables to make sure they are same row by row and column by column on SQl server

5 Answers5