15

I have 2 tables. One is a table with things that can be learned. There is a JID that desribes each kind of row, and is unique to each row. The second table is a log of things that have been learned (the JID) and also the userid for the person that learned it. I am currently using this to select all of the data for the JID, but only the ones the user has learned based on userid.

SELECT * 
FROM tablelist1
LEFT JOIN tablelog2 ON (tablelist1.JID = tablelog2.JID) 
                       AND tablelog2.UID = 'php var'
WHERE tablelog2.JID IS NOT NULL

I now need to select the rows of things to learn, but only the things the userid has NOT already learned. I am obviously very new to this, bear with me. :) I tried using IS NULL, but while it seems it works, it gives duplicate JID's one being NULL, one being correct.

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
user192738
  • 207
  • 1
  • 2
  • 6

2 Answers2

32

Using LEFT JOIN/IS NULL:

   SELECT t.*
     FROM TABLE_LIST t
LEFT JOIN TABLE_LOG tl ON tl.jid = t.jid
    WHERE tl.jid IS NULL

Using NOT IN:

SELECT t.*
  FROM TABLE_LIST t
 WHERE t.jid NOT IN (SELECT tl.jid
                       FROM TABLE_LOG tl
                   GROUP BY tl.jid)

Using NOT EXISTS:

SELECT t.*
  FROM TABLE_LIST t
 WHERE NOT EXISTS(SELECT NULL
                    FROM TABLE_LOG tl
                   WHERE tl.jid = t.jid)

FYI
LEFT JOIN/IS NULL and NOT IN are equivalent in MySQL - they will perform the same, while NOT EXISTS is slower/less efficient. For more details: http://explainextended.com/2009/09/18/not-in-vs-not-exists-vs-left-join-is-null-mysql/

OMG Ponies
  • 325,700
  • 82
  • 523
  • 502
  • Is it possible to do w/o using subqueries? subqueries are very slow, and table 2 will have MANY rows. The same way I used a join instead initially. – user192738 Oct 21 '09 at 01:48
  • 1
    Left Join worked perfect. Thanks for the FYI. I didnt know that NOT IN was equal, even though it seems like the extra select would slow it down! – user192738 Oct 21 '09 at 02:16
3

First off, you should be using an INNER JOIN on your existing query:

SELECT * FROM tablelist1
    INNER JOIN tablelog2 ON (tablelist1.JID = tablelog2.JID) 
    WHERE tablelog2.UID = 'php var'

The way you're doing it you're getting all the rows from tablelist1, then going to extra trouble to exclude the ones that don't have a match in tablelog2. The INNER JOIN will do that for you, and more efficiently.

Secondly, to find for user "X" all the learnable-things that the user hasn't learned, do:

SELECT * FROM tablelist1 
    WHERE NOT EXISTS (SELECT JID FROM tablelog2 WHERE UID = 'X')
Larry Lustig
  • 49,320
  • 14
  • 110
  • 160
  • I will test the first suggestion. the second (not exist) is easier, but is slower and is something I would like to avoid as t2log might have a lot of records. – user192738 Oct 21 '09 at 01:57
  • The second command's execution time will depend mainly on the size of tablelist1 and the number of records matching UID = 'X' in tablelog2. Note that the sub-query is not correlated with the main part of the query and will only have to run once and is easily handled by a covering index if one exists for UID, JID. – Larry Lustig Oct 21 '09 at 03:03
  • PS: Do you mean you KNOW the second command is too slow, or you THINK it will be because you read somewhere to avoid that kind of construction? Test it first, unless there are a very large number of learned-things for UID "X" it should be plenty fast. – Larry Lustig Oct 21 '09 at 03:06
  • The table2 is expected to be about 1m rows possibly, with over 200k users possibly. So while it is not an issue now, I want to make sure it is kept in mind now. Every ms counts to me. But I will plus your answer because it is helpful. – user192738 Oct 21 '09 at 10:51
  • Look, from SQL you posted in your question, it's clear that you are, as you say, very, very new at this. Why post a question and then argue with the people who try to help you? 200K users and 1M records is only 5 records per user, taken from a covering index, and executed only once across the rows you examine in tablelist. You're almost certainly inventing a performance bottleneck where none exists. – Larry Lustig Oct 21 '09 at 11:52
  • Helped me to find rows between two separate tables with the same column structure that contained the same data in a specified column: `SELECT * FROM table1 INNER JOIN table2 ON (table1.col1 = table2.col1)` – commnux Feb 15 '17 at 20:45