1

I have two tables and want to join them together in such a way that all values in the second table are excluded. When I join both tables, I only want the values in table 1, and don't want the values in both tables or the values in just table 2.

I thought this would be done with a LEFT JOIN or LEFT OUTER JOIN, but I'm finding some odd findings.

When I check all the data, I have the following count.

-- TOTAL LEADS: 10067
SELECT COUNT(*) FROM sold_leads AS sl 
WHERE sl.affiliate_id IN(1000,1001,1002,1033) 
AND sl.create_date >= '2013-1-1' 
AND sl.lead_type = 'AUTO'; 

However, when I attempt to find only the values just in table 1, I find a higher number, which is just not possible.

 -- How many had No suspension/dui, sr22, and <=2 accidents AND <=2 tickets: 13931
 SELECT COUNT(*) FROM sold_leads AS sl
 INNER JOIN drivers AS dr ON sl.lead_id = dr.lead_id
 LEFT OUTER JOIN duis AS duis ON sl.lead_id = duis.driver_id
 LEFT OUTER JOIN accidents AS ac ON sl.lead_id = ac.driver_id  
 WHERE sl.affiliate_id IN(1000,1001,1002,1033)
 AND sl.create_date >= '2013-1-1'
 AND dr.relationship_type = 'SELF';

There's just no way that a total count would be less than the count where I left join some tables and add some restrictions. What am I doing wrong.

Also, I know I haven't provided the schema, but how would I select just <=2 accidents. There is an accidents table that has a driver_id, but I've played with different variation of HAVING(COUNT(*)) and had no luck.

Help!?

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
ATMathew
  • 12,566
  • 26
  • 69
  • 76
  • Its hard to tell without out sample data but its pretty easy to get more records when you do a join. Perhaps you should do a `COUNT (DISTINCT sl.affiliate_id)` instead of `count(*)` – Conrad Frix Feb 13 '13 at 05:34

2 Answers2

2

Exclusion joins generally have a condition that limits the results to those where this is no match in the outer join table:

SELECT COUNT(*) FROM sold_leads AS sl
INNER JOIN drivers AS dr ON sl.lead_id = dr.lead_id
LEFT OUTER JOIN duis AS duis ON sl.lead_id = duis.driver_id
LEFT OUTER JOIN accidents AS ac ON sl.lead_id = ac.driver_id  
WHERE sl.affiliate_id IN(1000,1001,1002,1033)
AND sl.create_date >= '2013-1-1'
AND dr.relationship_type = 'SELF'
-- these are the exclusion join tests:
AND duis.driver_id IS NULL AND ac.driver_id IS NULL

Also the comment from @ConradFrix is good; if you have >1 row in drivers per row in sold_leads, you could get a multiplicative effects, so it's better to return COUNT(DISTINCT sl.something) where something is a unique column.

I'm also not so sure about your join conditions, you're comparing duis.driver_id and ac.driver_id to sl.lead_id. But dr.lead_id is also compared to sl.lead_id. Is one or the other of these conditions incorrect? Of course, you know your schema better than I do, but it looks like the column names are inconsistent.

Bill Karwin
  • 538,548
  • 86
  • 673
  • 828
1

If drivers, duis, or accidents has more than one row with the same lead_id, then your result will have more than one row with the same lead_id, and you will increase the total number of rows. You probably meant to add this to your where clause:

AND     duis.lead_id IS NULL
AND     ac.lead_id IS NULL

If you only want sold_leads that don't have a matching record in duis or accidents, then it may be clearer to use NOT IN and a subquery instead of joining directly to those tables. You can see some examples and a discussion of some alternatives in this question.

Community
  • 1
  • 1
Don Kirkby
  • 53,582
  • 27
  • 205
  • 286