Multiple left joins leading to logical error

Question

I am trying to develop a query that counts certain values for multiple tables.

The query works fine when I am taking the aggregate count of a field with just one left join in place. But when I add another left join, the aggregate fields in my results are incorrect. I.E., the counts are plain wrong.

I want to left join two tables to my main table (dbo.rep_profile), and then get a count of certain values within each table. But as soon as I left join the second table, my results are thrown off and some appear wrong, while others appear right.

Here is my code, and beneath it is a better synopsis of my issue:

    select rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY, 
count(ac.ACTIVITY_CONTACT_ID) as count_of_activities,
count(cl.LABEL_ID) as count_of_labels --including public, private, and shared

from dbo.REP_PROFILE rp (nolock)
left join dbo.ACTIVITY_CONTACT ac (nolock) on rp.CONTACT_ID = ac.CONTACT_ID
left join dbo.CONTACT_LABEL cl (nolock) on rp.CONTACT_ID = cl.CONTACT_ID --if this join is removed or commented out, the query return logically correct results


where 
rp.CREATED_DT between '2013-06-01' and '2014-01-01'
and rp.UPDATED_DT != rp.CREATED_DT --record has been updated at least one time after the date of it's creation
and rp.CREATED_BY in  --record was created by a past or present member of our team
(select ur.user_id
from dbo.SP_USER_ROLE ur
where ur.ROLE_ID = 'X')
/*and rp.UPDATED_BY not in --last update NOT made by our team
(select ur.user_id
from dbo.SP_USER_ROLE ur
where ur.ROLE_ID = 'X')*/

group by rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY

having count(ac.ACTIVITY_CONTACT_ID)>0 --record has at least one activity
--or count(cl.LABEL_ID)>0 --record has at least one label

order by rp.CONTACT_ID

If a contact_ID (the primary key I am joining on) appears in both tables which I am joining (both the activity_contact and label_id) then the results of both the count_of_activities aggregate column and count_of_labels aggregate column are incorrect. BUT... if a certain contact_id appears in just ONE of the joined tables, then the aggregate results are correct.

Here is a Venn Diagram of what I am attempting to do with all my left joins leading to the Rep_Profile table:

enter image description here

I am stumped. I don't understand the logical flaw that is causing erroneous aggregate counts.

EDIT Here is my working code, with the new sub-queries in the select statement

    select rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY, 
(select count(ac.ACTIVITY_CONTACT_ID) from ACTIVITY_CONTACT ac where rp.CONTACT_ID = ac.CONTACT_ID) as count_of_activities, 
(select count(cl.LABEL_ID) from contact_label cl where rp.CONTACT_ID = cl.CONTACT_ID) as count_of_labels, --including public, private, and shared
(select count(th.TRANSACTION_ID) from TRANSACTION_HISTORY th where rp.CONTACT_ID = th.CONTACT_ID) as count_of_trades

from dbo.REP_PROFILE rp (nolock) --query gave logical errors when multiple joins were attempted, used sub-queries in Select statement to fix the issue

where 
rp.CREATED_DT between '2013-06-01' and '2014-01-01'
and rp.UPDATED_DT != rp.CREATED_DT --record has been updated at least one time after the date of it's creation
and rp.CREATED_BY in  --record was created by a past or present member of our team
(select ur.user_id
from dbo.SP_USER_ROLE ur
where ur.ROLE_ID = 'X') 
/*the following criteria ensure that the query results will display reps with at least 1 activity, label, or trade. */
and 
((select count(th.TRANSACTION_ID) from TRANSACTION_HISTORY th where rp.CONTACT_ID = th.CONTACT_ID)>0 --trades > 0
or (select count(cl.LABEL_ID) from contact_label cl where rp.CONTACT_ID = cl.CONTACT_ID)>0
or (select count(ac.ACTIVITY_CONTACT_ID) from ACTIVITY_CONTACT ac where rp.CONTACT_ID = ac.CONTACT_ID)>0) --labels > 0

group by rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY

order by rp.CONTACT_ID

Here,JOIN on the first table ,namely rp `left join dbo.CONTACT_LABEL cl (nolock) on ac.CONTACT_ID = cl.CONTACT_ID` rp.CONTACT_ID = cl.CONTACT_ID — Mihai, May 02 '14 at 18:40
I apologize, the joins in my posted code were not up-to-date, I have updated my code. — , May 02 '14 at 18:44
Can you detect a pattern in the results which might be meaningful? — Mihai, May 02 '14 at 18:48
YES The two aggregate columns will return the same value, and it looks like the value returned is the product of (count_of_labels * count_of_activities). Also, if I split this into two queries, each with a single left join, it works just fine. — , May 02 '14 at 18:49
Move the conditions from where to on clause.http://stackoverflow.com/a/4752557/1745672 — Mihai, May 02 '14 at 18:50
And how many rows there could be in contact_label table for some contact_id, one or more? — dean, May 02 '14 at 18:54
The logical error I am seeing is present in my results even if all of the `Where` clause criteria is removed, so the where clause criteria can not be the issue. Thank you though. — , May 02 '14 at 18:54
Do you have an active db?NOLOCK might read results which will not be committed. — Mihai, May 02 '14 at 18:56
I am querying a snapshot, nolock is just a bit of a speed booster in this case (from what I understand about NOLock) the query has the same logical error with NoLock removed. — , May 02 '14 at 18:57
@dean , the contact_label table uses Label_ID as it's primary key. So one Contact_ID can point to multiple Label_ID's within this table, so therefore there are duplicate contact ID's within this table — , May 02 '14 at 19:00

paparazzo · Answer 1 · 2014-05-02T19:10:44.727

This having breaks the left part of the left join dbo.ACTIVITY_CONTACT

having count(ac.ACTIVITY_CONTACT_ID) > 0

that dbo.CONTACT_LABEL join brings in duplicate rows from dbo.ACTIVITY_CONTACT
try a distinct

Not sure this is a fix but it might get you there

select rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY
      ,count(distinct(ac.ACTIVITY_CONTACT_ID)) as count_of_activities
      ,count(cl.LABEL_ID) as count_of_labels --including public, private, and shared
  from dbo.REP_PROFILE rp (nolock)
  join dbo.SP_USER_ROLE ur 
    on ur.user_id = rp.CREATED_BY 
   and ur.ROLE_ID = '00003'
   and rp.CREATED_DT between '2013-06-01' and '2014-01-01'
   and rp.UPDATED_DT != rp.CREATED_DT --record has been updated at least one time after the date of it's creation 
  join dbo.ACTIVITY_CONTACT ac (nolock) 
    on rp.CONTACT_ID = ac.CONTACT_ID 
  left join dbo.CONTACT_LABEL cl (nolock) 
    on rp.CONTACT_ID = cl.CONTACT_ID --if this join is removed or commented out, the query return logically correct results
 group by rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY
 order by rp.CONTACT_ID

Well my goal here is to show only records which have a count of either of these aggregate fields such that count > 0. But to test this idea, I removed the entire Having Clause, and it did not solve the issue. Thank you for the answer though. — , May 02 '14 at 19:04
Fixing a query starts with breaking it down and cleaning it up — paparazzo, May 02 '14 at 19:07

score 0 · Accepted Answer · answered May 02 '14 at 19:05

0

Left join on 1:m relationship will produce m times rows.

So, when you add a join to contact_label table, there will be rows in result as many as there is matching rows in contact_label. This will affect the results af aggregations.

select rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY, 
count(ac.ACTIVITY_CONTACT_ID) as count_of_activities,
(select count(cl.LABEL_ID) from contact_label cl where rp.CONTACT_ID = cl.CONTACT_ID) as count_of_labels
from dbo.REP_PROFILE rp (nolock)
left join dbo.ACTIVITY_CONTACT ac (nolock) on rp.CONTACT_ID = ac.CONTACT_ID
where...

answered May 02 '14 at 19:05

dean

9,960
2
25
26

PERFECT! This worked great. What would that nested query in the `Select` clause be called technically? Just a sub query in the select statement? – May 02 '14 at 19:10
It's just a correlated scalar subquery used in select list, can't remember if there's a fancy name for it or not :) – dean May 02 '14 at 19:13
I used your idea of the sub-query to completely remove all joins. That makes this rather complex query a lot easier to handle conceptually. I updated my OP with my new and improved code. Thanks again. – May 02 '14 at 19:51

Multiple left joins leading to logical error

2 Answers2