Related question: How to select rows with no matching entry in another table?
I was trying to select rows using this method and couldn't get it to work in SQLite. After a bit of wrangling it occurred to me that the reason might be that there are NULL values in some fields. Sure enough, I was right, and when I changed the =
to IS
in the query below things started behaving as expected:
CREATE TEMP TABLE newEvent(id INTEGER,t INTEGER,name,extra,extra2,extra3);
INSERT INTO newEvent(id,t,name,extra,extra2,extra3) VALUES
(0, 1376351146, 'TEST', NULL, NULL, NULL),
(0, 1376348867, 'OLD', NULL, NULL,NULL);
SELECT n.id,n.t,n.name,n.extra,n.extra2,n.extra3 FROM newEvent n
LEFT JOIN event E ON n.t = E.t AND n.name IS E.name
AND n.extra IS E.extra;
AND n.extra2 IS E.extra2;
AND n.extra3 IS E.extra3
WHERE E.id IS NULL;
DROP TABLE newEvent;
In the above example, there is an existing record in table event
with name='OLD'
. The newEvent
table is defined the same as the original event
table.
However, I noticed a BIG problem: my query was now taking almost 30 seconds to run! If I change only the n.name IS E.name
to n.name = E.name
but leave all the other IS
es as-is, then the query only takes around 400ms. (There are around 3 million records in table event
.)
Why the big difference in performance? It turns out I can actually use =
instead of IS
for the name
comparison because it is never null, but if it ever were to be NULL it seems like this would break. Conversely, I am concerned that at some point the query might start running slow, since I don't understand what it is about name
that makes the equality query run so much faster. My guess is that maybe SQLite somehow knows that there are nulls in the extra fields and is able to optimize but I would like something a bit more firm than a wild guess.
As far as I can tell, IS
is simply =
with the additional provisio that it will treat NULL comparisions the same as if they were empty strings (assuming there are no actual empty strings to compare). So why is using =
on the name field 75 times faster, but has no effect on performance on the extra fields???