1

I am trying to develop a query that counts certain values for multiple tables.

The query works fine when I am taking the aggregate count of a field with just one left join in place. But when I add another left join, the aggregate fields in my results are incorrect. I.E., the counts are plain wrong.

I want to left join two tables to my main table (dbo.rep_profile), and then get a count of certain values within each table. But as soon as I left join the second table, my results are thrown off and some appear wrong, while others appear right.

Here is my code, and beneath it is a better synopsis of my issue:

    select rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY, 
count(ac.ACTIVITY_CONTACT_ID) as count_of_activities,
count(cl.LABEL_ID) as count_of_labels --including public, private, and shared

from dbo.REP_PROFILE rp (nolock)
left join dbo.ACTIVITY_CONTACT ac (nolock) on rp.CONTACT_ID = ac.CONTACT_ID
left join dbo.CONTACT_LABEL cl (nolock) on rp.CONTACT_ID = cl.CONTACT_ID --if this join is removed or commented out, the query return logically correct results


where 
rp.CREATED_DT between '2013-06-01' and '2014-01-01'
and rp.UPDATED_DT != rp.CREATED_DT --record has been updated at least one time after the date of it's creation
and rp.CREATED_BY in  --record was created by a past or present member of our team
(select ur.user_id
from dbo.SP_USER_ROLE ur
where ur.ROLE_ID = 'X')
/*and rp.UPDATED_BY not in --last update NOT made by our team
(select ur.user_id
from dbo.SP_USER_ROLE ur
where ur.ROLE_ID = 'X')*/

group by rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY

having count(ac.ACTIVITY_CONTACT_ID)>0 --record has at least one activity
--or count(cl.LABEL_ID)>0 --record has at least one label

order by rp.CONTACT_ID

If a contact_ID (the primary key I am joining on) appears in both tables which I am joining (both the activity_contact and label_id) then the results of both the count_of_activities aggregate column and count_of_labels aggregate column are incorrect. BUT... if a certain contact_id appears in just ONE of the joined tables, then the aggregate results are correct.

Here is a Venn Diagram of what I am attempting to do with all my left joins leading to the Rep_Profile table:

enter image description here

I am stumped. I don't understand the logical flaw that is causing erroneous aggregate counts.

EDIT Here is my working code, with the new sub-queries in the select statement

    select rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY, 
(select count(ac.ACTIVITY_CONTACT_ID) from ACTIVITY_CONTACT ac where rp.CONTACT_ID = ac.CONTACT_ID) as count_of_activities, 
(select count(cl.LABEL_ID) from contact_label cl where rp.CONTACT_ID = cl.CONTACT_ID) as count_of_labels, --including public, private, and shared
(select count(th.TRANSACTION_ID) from TRANSACTION_HISTORY th where rp.CONTACT_ID = th.CONTACT_ID) as count_of_trades

from dbo.REP_PROFILE rp (nolock) --query gave logical errors when multiple joins were attempted, used sub-queries in Select statement to fix the issue

where 
rp.CREATED_DT between '2013-06-01' and '2014-01-01'
and rp.UPDATED_DT != rp.CREATED_DT --record has been updated at least one time after the date of it's creation
and rp.CREATED_BY in  --record was created by a past or present member of our team
(select ur.user_id
from dbo.SP_USER_ROLE ur
where ur.ROLE_ID = 'X') 
/*the following criteria ensure that the query results will display reps with at least 1 activity, label, or trade. */
and 
((select count(th.TRANSACTION_ID) from TRANSACTION_HISTORY th where rp.CONTACT_ID = th.CONTACT_ID)>0 --trades > 0
or (select count(cl.LABEL_ID) from contact_label cl where rp.CONTACT_ID = cl.CONTACT_ID)>0
or (select count(ac.ACTIVITY_CONTACT_ID) from ACTIVITY_CONTACT ac where rp.CONTACT_ID = ac.CONTACT_ID)>0) --labels > 0

group by rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY

order by rp.CONTACT_ID
  • Here,JOIN on the first table ,namely rp `left join dbo.CONTACT_LABEL cl (nolock) on ac.CONTACT_ID = cl.CONTACT_ID` rp.CONTACT_ID = cl.CONTACT_ID – Mihai May 02 '14 at 18:40
  • I apologize, the joins in my posted code were not up-to-date, I have updated my code. –  May 02 '14 at 18:44
  • Can you detect a pattern in the results which might be meaningful? – Mihai May 02 '14 at 18:48
  • YES The two aggregate columns will return the same value, and it looks like the value returned is the product of (count_of_labels * count_of_activities). Also, if I split this into two queries, each with a single left join, it works just fine. –  May 02 '14 at 18:49
  • Move the conditions from where to on clause.http://stackoverflow.com/a/4752557/1745672 – Mihai May 02 '14 at 18:50
  • And how many rows there could be in contact_label table for some contact_id, one or more? – dean May 02 '14 at 18:54
  • The logical error I am seeing is present in my results even if all of the `Where` clause criteria is removed, so the where clause criteria can not be the issue. Thank you though. –  May 02 '14 at 18:54
  • Do you have an active db?NOLOCK might read results which will not be committed. – Mihai May 02 '14 at 18:56
  • I am querying a snapshot, nolock is just a bit of a speed booster in this case (from what I understand about NOLock) the query has the same logical error with NoLock removed. –  May 02 '14 at 18:57
  • @dean , the contact_label table uses Label_ID as it's primary key. So one Contact_ID can point to multiple Label_ID's within this table, so therefore there are duplicate contact ID's within this table –  May 02 '14 at 19:00

2 Answers2

0

This having breaks the left part of the left join dbo.ACTIVITY_CONTACT

having count(ac.ACTIVITY_CONTACT_ID) > 0

that dbo.CONTACT_LABEL join brings in duplicate rows from dbo.ACTIVITY_CONTACT
try a distinct

Not sure this is a fix but it might get you there

select rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY
      ,count(distinct(ac.ACTIVITY_CONTACT_ID)) as count_of_activities
      ,count(cl.LABEL_ID) as count_of_labels --including public, private, and shared
  from dbo.REP_PROFILE rp (nolock)
  join dbo.SP_USER_ROLE ur 
    on ur.user_id = rp.CREATED_BY 
   and ur.ROLE_ID = '00003'
   and rp.CREATED_DT between '2013-06-01' and '2014-01-01'
   and rp.UPDATED_DT != rp.CREATED_DT --record has been updated at least one time after the date of it's creation 
  join dbo.ACTIVITY_CONTACT ac (nolock) 
    on rp.CONTACT_ID = ac.CONTACT_ID 
  left join dbo.CONTACT_LABEL cl (nolock) 
    on rp.CONTACT_ID = cl.CONTACT_ID --if this join is removed or commented out, the query return logically correct results
 group by rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY
 order by rp.CONTACT_ID
paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • Well my goal here is to show only records which have a count of either of these aggregate fields such that count > 0. But to test this idea, I removed the entire Having Clause, and it did not solve the issue. Thank you for the answer though. –  May 02 '14 at 19:04
  • 1
    Fixing a query starts with breaking it down and cleaning it up – paparazzo May 02 '14 at 19:07
0

Left join on 1:m relationship will produce m times rows.

So, when you add a join to contact_label table, there will be rows in result as many as there is matching rows in contact_label. This will affect the results af aggregations.

select rp.CRD_NUMBER, rp.CONTACT_ID, rp.CREATED_BY, rp.CREATED_DT, rp.UPDATED_DT, rp.UPDATED_BY, 
count(ac.ACTIVITY_CONTACT_ID) as count_of_activities,
(select count(cl.LABEL_ID) from contact_label cl where rp.CONTACT_ID = cl.CONTACT_ID) as count_of_labels
from dbo.REP_PROFILE rp (nolock)
left join dbo.ACTIVITY_CONTACT ac (nolock) on rp.CONTACT_ID = ac.CONTACT_ID
where...
dean
  • 9,960
  • 2
  • 25
  • 26
  • PERFECT! This worked great. What would that nested query in the `Select` clause be called technically? Just a sub query in the select statement? –  May 02 '14 at 19:10
  • It's just a correlated scalar subquery used in select list, can't remember if there's a fancy name for it or not :) – dean May 02 '14 at 19:13
  • I used your idea of the sub-query to completely remove all joins. That makes this rather complex query a lot easier to handle conceptually. I updated my OP with my new and improved code. Thanks again. –  May 02 '14 at 19:51