0

SQL joins have never been my strength. I would like some help with this one. It is probably an easy one for you SQL maestros.

I have 2 tables with the same columns. Let's say their structure is:

id INTEGER PRIMARY KEY AUTOINCREMENT,
key INTEGER,
description TEXT NOT NULL,
size INTEGER,
timestamp LONG,
is_on INTEGER

The table names are Shirts1 and Shirts2. The tables represent 2 versions of a dataset, and have a high amount of data overlap. The goal is to find which rows are different. key is the key which remains the same from version to version. Remaining columns can change from version 1 to 2.

Answers should be ideally for SQLite on Android. Multiple queries are OK - need not be 1 query.

My guess

SELECT * FROM Shirts1, Shirts2 WHERE Shirts1.key=Shirts2.key AND 
        (Shirts1.description != Shirts2.description OR 
        (Shirts1.size != Shirts2.size OR
        (Shirts1.timestamp != Shirts2.timestamp OR
        (Shirts1.is_on != Shirts2.is_on)

Another concern is would a query like this cause issues on an Android phone with limited device memory? There are 1000 rows in both tables. Should I break out the query into multiple queries, limiting comparison to 100 rows at at time for instance, as key goes from 1-1000 in order.

mu is too short
  • 426,620
  • 70
  • 833
  • 800
amit
  • 1,373
  • 2
  • 16
  • 29

1 Answers1

0

Shirt1:

1;1;"green_shirt";1;1;1
4;4;"yellow_shirt";1;1;1

Shirt2:

1;1;"green_shirt";1;1;1
2;2;"red_shirt";1;1;1
3;3;"orange_shirt";1;1;1



SELECT shirt2.*
  FROM shirt1
  RIGHT OUTER JOIN shirt2 ON shirt1.key = shirt2.key
 WHERE shirt1.description != shirt2.description OR
       IFNULL(shirt1.size, -1) != IFNULL(shirt2.size, -1) OR
       IFNULL(shirt1.timestamp, -1) != IFNULL(shirt2.timestamp, -1) OR
       IFNULL(shirt1.is_on, -1) != IFNULL(shirt2.is_on, -1)      
 UNION ALL
SELECT shirt1.*
  FROM shirt2
 RIGHT JOIN shirt1 ON shirt1.key = shirt2.key
 WHERE shirt1.description != shirt2.description OR
       IFNULL(shirt1.size, -1) != IFNULL(shirt2.size, -1) OR
       IFNULL(shirt1.timestamp, -1) != IFNULL(shirt2.timestamp, -1) OR
       IFNULL(shirt1.is_on, -1) != IFNULL(shirt2.is_on, -1)
 UNION ALL
SELECT shirt1.*
  FROM shirt1
 INNER JOIN shirt2 ON shirt1.key = shirt2.key
 WHERE shirt1.description = shirt2.description OR
       IFNULL(shirt1.size, -1) = IFNULL(shirt2.size, -1) OR
       IFNULL(shirt1.timestamp, -1) = IFNULL(shirt2.timestamp, -1) OR
       IFNULL(shirt1.is_on, -1) = IFNULL(shirt2.is_on, -1)

Output:

2;2;"red_shirt";1;1;1
3;3;"orange_shirt";1;1;1
4;4;"yellow_shirt";1;1;1
1;1;"green_shirt";1;1;1

If you want to exclude the shirts that are included in shirt1/shirt2 you just have to remove the last union. That means

SELECT shirt2.*
  FROM shirt1
  RIGHT OUTER JOIN shirt2 ON shirt1.key = shirt2.key
 WHERE shirt1.description != shirt2.description OR
       IFNULL(shirt1.size, -1) != IFNULL(shirt2.size, -1) OR
       IFNULL(shirt1.timestamp, -1) != IFNULL(shirt2.timestamp, -1) OR
       IFNULL(shirt1.is_on, -1) != IFNULL(shirt2.is_on, -1)      
 UNION ALL
SELECT shirt1.*
  FROM shirt2
 RIGHT JOIN shirt1 ON shirt1.key = shirt2.key
 WHERE shirt1.description != shirt2.description OR
       IFNULL(shirt1.size, -1) != IFNULL(shirt2.size, -1) OR
       IFNULL(shirt1.timestamp, -1) != IFNULL(shirt2.timestamp, -1) OR
       IFNULL(shirt1.is_on, -1) != IFNULL(shirt2.is_on, -1)

Output:

2;2;"red_shirt";1;1;1
3;3;"orange_shirt";1;1;1
4;4;"yellow_shirt";1;1;1

Think you will want everything in the same column so I think this should be the better solution. If you are ok with having different columns for shirt1/shirt2 then you can also use select * and ignore the union.

About the device question: Depends on the device, but if the device is old then I don't think that the query will be your main issue. You could also try to add a few indices to improve performance.

Edit: changed the solution to join on the key column

Eggi
  • 1,684
  • 4
  • 20
  • 31
  • i guess integers can be null thats why the ifnull check is needed.. btw, i believe _key_ is what is we need to join on. also can you explain what you mean by "everything in the same column", please? – amit Mar 01 '12 at 20:11
  • I mean that if you do a select * you would get something like shirt1.key, shirt2.key and so on (so you would get a column for every column in every shirt). I think you would prefer a solution where you only have one column (so for for the keys you would get a single column named key with the joined values of shirt1 and two) and have both values in that single column. – Eggi Mar 01 '12 at 20:39
  • i see. btw, can you do string comparison with **!=**? or is there a function? the comparison needs to be utf8 because the description can be foreign languages. – amit Mar 02 '12 at 00:07
  • If you don't have any special requirements like case independence it should work in this way. – Eggi Mar 02 '12 at 05:47
  • how would i make it case independent? – amit Mar 02 '12 at 21:01
  • http://stackoverflow.com/questions/973541/how-to-set-sqlite3-to-be-case-insensitive-when-string-comparing – Eggi Mar 02 '12 at 21:24
  • For other people, the UNION shows both versions of a row that is different. – amit Mar 03 '12 at 10:27
  • I modified my solution and added output. As I wasn't really sure what you meant I included two solutions with input/output. – Eggi Mar 03 '12 at 13:13