0

I am creating three CTEs that I ultimately want to join together to create a summary table. Two of the three tables contain the same columns except the items they are summing are different (quotes vs new business). The other table contains two summation columns, reports ordered and total cost, and a subcategory for another field, report type, which rolls up into LOB.

The issue comes about when I attempt to join them all together and "re-sum" the fields. For some reason I am getting duplicate values even though I am using DISTINCT in my select clause and I think I am joining everything properly.

When I remove the SUM from Total_Quotes and Total_NB the code works; however, I expected that summing a line that only had one value would be fine. Apparently that's not the case.

WITH
MONTHLY_INVOICES AS
(SELECT DISTINCT
EXTRACT(YEAR FROM INVOICE_DATE) AS YEAR
,EXTRACT(MONTH FROM INVOICE_DATE) AS MONTH
,STATE
,REPORT_TYPE
,CASE WHEN REPORT_TYPE IN ('ADD', 'CCA', 'CLA', 'MVR', 'VIN') THEN 'AUTO'
      WHEN REPORT_TYPE IN ('CCP', 'CLP') THEN 'HOME'
      WHEN REPORT_TYPE IN ('DPF', 'NCF') THEN 'OTHER'
      END AS LOB
,SUM(TOTAL_ORDERS) AS TOTAL_ORDERS
,SUM(TOTAL_COST) AS TOTAL_COST
FROM
KAHJZK_VENDOR_INVOICE_TOTALS
GROUP BY
EXTRACT(YEAR FROM INVOICE_DATE)
,EXTRACT(MONTH FROM INVOICE_DATE)
,STATE
,REPORT_TYPE
,CASE WHEN REPORT_TYPE IN ('ADD', 'CCA', 'CLA', 'MVR', 'VIN') THEN 'AUTO'
      WHEN REPORT_TYPE IN ('CCP', 'CLP') THEN 'HOME'
      WHEN REPORT_TYPE IN ('DPF', 'NCF') THEN 'OTHER'
      END)

,MONTHLY_QT AS 
(SELECT DISTINCT
EXTRACT(YEAR FROM REPORT_DATE) AS YEAR
,EXTRACT(MONTH FROM REPORT_DATE) AS MONTH
,STATE
,LOB
,SUM(TOTAL_QUOTES) AS TOTAL_QTS
FROM KAHJZK_DAILY_QT_VOL
WHERE
EXTRACT(YEAR FROM REPORT_DATE) >= '2019'
GROUP BY
EXTRACT(YEAR FROM REPORT_DATE)
,EXTRACT(MONTH FROM REPORT_DATE)
,STATE
,LOB)

,MONTHLY_NB AS 
(SELECT DISTINCT
EXTRACT(YEAR FROM REPORT_DATE) AS YEAR
,EXTRACT(MONTH FROM REPORT_DATE) AS MONTH
,STATE
,LOB
,SUM(TOTAL_POLICIES) AS TOTAL_NB
FROM KAHJZK_DAILY_NB_VOL
WHERE
EXTRACT(YEAR FROM REPORT_DATE) >= '2019'
GROUP BY
EXTRACT(YEAR FROM REPORT_DATE)
,EXTRACT(MONTH FROM REPORT_DATE)
,STATE
,LOB)


SELECT DISTINCT
A.YEAR
,A.MONTH
,A.STATE
,A.LOB
,SUM(A.TOTAL_ORDERS)
,SUM(A.TOTAL_COST)
,SUM(B.TOTAL_QTS)
,SUM(C.TOTAL_NB)
FROM
MONTHLY_INVOICES A
LEFT OUTER JOIN MONTHLY_QT B
ON A.YEAR = B.YEAR
AND A.MONTH = B.MONTH
AND A.STATE = B.STATE
AND A.LOB = B.LOB
LEFT OUTER JOIN MONTHLY_NB C
ON A.YEAR = C.YEAR
AND A.MONTH = C.MONTH
AND A.STATE = C.STATE
AND A.LOB = C.LOB
GROUP BY
A.YEAR
,A.MONTH
,A.STATE
,A.LOB
ORDER BY
3,4,1,2;

When I change the last part to the below, the code works.:

SELECT DISTINCT
A.YEAR
,A.MONTH
,A.STATE
,A.LOB
,SUM(A.TOTAL_ORDERS)
,SUM(A.TOTAL_COST)
,B.TOTAL_QTS
,C.TOTAL_NB
FROM
MONTHLY_INVOICES A
LEFT OUTER JOIN MONTHLY_QT B
ON A.YEAR = B.YEAR
AND A.MONTH = B.MONTH
AND A.STATE = B.STATE
AND A.LOB = B.LOB
LEFT OUTER JOIN MONTHLY_NB C
ON A.YEAR = C.YEAR
AND A.MONTH = C.MONTH
AND A.STATE = C.STATE
AND A.LOB = C.LOB
GROUP BY
A.YEAR
,A.MONTH
,A.STATE
,A.LOB
,B.TOTAL_QTS
,C.TOTAL_NB
ORDER BY
3,4,1,2;
J King
  • 3
  • 2
  • Here's a link that I believe explains my issue. [https://stackoverflow.com/questions/12464037/two-sql-left-joins-produce-incorrect-result/12464135#12464135](https://stackoverflow.com/questions/12464037/two-sql-left-joins-produce-incorrect-result/12464135#12464135) – J King Jul 26 '19 at 20:37

2 Answers2

0
  1. You must provide one or more columns that uniquely identify each row within each of the tables.
  2. You must provide a means by which the uniquely identified row in one table may be joined to a uniquely identified row in another table.

Failure to resolve both items listed above will result in a result set that is the Cartesian product (also known as a cross product) of the selection criteria.

Edit (response to comment 1)
The distinct will not impact the cross product that results from the join. If you are getting a cross product, then it is possible that you are selecting rows on one side of the join that do not match any row on the other side.

For an example of this, try select a.field, b.field from table 1 a, table 2 b (i.e. no join, just list two tables) and you will get a cross product.

This will likely be the cause if you don't get duplicates after adding the left join clause.

DwB
  • 37,124
  • 11
  • 56
  • 82
  • Thank you for your quick answer! I thought by joining on YEAR, MONTH, STATE, and LOB and using SELECT DISTINCT I would prevent the one to many join. I'm still a little confused. Is the existence of REPORT_TYPE in the first CTE causing the issue even though I am not bringing it in as a part of the final SELECT? – J King Jul 26 '19 at 15:31
0

You shouldn't be using DISTINCT with GROUP BY. GROUP BY will return the distinct combinations of the fields in the GROUP BY with the appropriate summations being made. So I think you want to do something like:

WITH MONTHLY_INVOICES AS (SELECT EXTRACT(YEAR FROM INVOICE_DATE) AS YEAR
                                ,EXTRACT(MONTH FROM INVOICE_DATE) AS MONTH
                                ,STATE
                                ,REPORT_TYPE
                                ,CASE WHEN REPORT_TYPE IN ('ADD', 'CCA', 'CLA', 'MVR', 'VIN') THEN 'AUTO'
                                      WHEN REPORT_TYPE IN ('CCP', 'CLP') THEN 'HOME'
                                      WHEN REPORT_TYPE IN ('DPF', 'NCF') THEN 'OTHER'
                                      END AS LOB
                                ,SUM(TOTAL_ORDERS) AS TOTAL_ORDERS
                                ,SUM(TOTAL_COST) AS TOTAL_COST
                            FROM KAHJZK_VENDOR_INVOICE_TOTALS
                            GROUP BY EXTRACT(YEAR FROM INVOICE_DATE)
                                    ,EXTRACT(MONTH FROM INVOICE_DATE)
                                    ,STATE
                                    ,REPORT_TYPE
                                    ,CASE WHEN REPORT_TYPE IN ('ADD', 'CCA', 'CLA', 'MVR', 'VIN') THEN 'AUTO'
                                          WHEN REPORT_TYPE IN ('CCP', 'CLP') THEN 'HOME'
                                          WHEN REPORT_TYPE IN ('DPF', 'NCF') THEN 'OTHER'
                                          END),
     MONTHLY_QT AS (SELECT EXTRACT(YEAR FROM REPORT_DATE) AS YEAR
                          ,EXTRACT(MONTH FROM REPORT_DATE) AS MONTH
                          ,STATE
                          ,LOB
                          ,SUM(TOTAL_QUOTES) AS TOTAL_QTS
                      FROM KAHJZK_DAILY_QT_VOL
                      WHERE EXTRACT(YEAR FROM REPORT_DATE) >= 2019
                      GROUP BY EXTRACT(YEAR FROM REPORT_DATE)
                              ,EXTRACT(MONTH FROM REPORT_DATE)
                              ,STATE
                              ,LOB),
     MONTHLY_NB AS (SELECT EXTRACT(YEAR FROM REPORT_DATE) AS YEAR
                          ,EXTRACT(MONTH FROM REPORT_DATE) AS MONTH
                          ,STATE
                          ,LOB
                          ,SUM(TOTAL_POLICIES) AS TOTAL_NB
                     FROM KAHJZK_DAILY_NB_VOL
                     WHERE EXTRACT(YEAR FROM REPORT_DATE) >= 2019
                     GROUP BY EXTRACT(YEAR FROM REPORT_DATE)
                             ,EXTRACT(MONTH FROM REPORT_DATE)
                             ,STATE
                             ,LOB)
SELECT A.YEAR
      ,A.MONTH
      ,A.STATE
      ,A.LOB
      ,SUM(A.TOTAL_ORDERS)
      ,SUM(A.TOTAL_COST)
      ,SUM(B.TOTAL_QTS)
      ,SUM(C.TOTAL_NB)
  FROM MONTHLY_INVOICES A
  LEFT OUTER JOIN MONTHLY_QT B
    ON A.YEAR = B.YEAR
   AND A.MONTH = B.MONTH
   AND A.STATE = B.STATE
   AND A.LOB = B.LOB
  LEFT OUTER JOIN MONTHLY_NB C
   ON A.YEAR = C.YEAR
  AND A.MONTH = C.MONTH
  AND A.STATE = C.STATE
  AND A.LOB = C.LOB
  GROUP BY A.YEAR
          ,A.MONTH
          ,A.STATE
          ,A.LOB
  ORDER BY 3,4,1,2;

You were also using strings (e.g. '2019') when comparing to the results of EXTRACT where you should have been using numbers (e.g. 2019) - I've updated that.

Give that a try and see if it does what you wanted.

  • Thanks for the answer and formatting! It still looks like it's giving me duplicates. – J King Jul 26 '19 at 16:25
  • I thought I had solved the issue by creating tables instead of CTEs. Turns out I accidentally left the old fix of removing the SUM from quotes and NB. – J King Jul 26 '19 at 16:50
  • I'm still getting duplicates. I removed REPORT_TYPE from the first CTE and that solves the issue; however I thought since I wasn't bringing it into my final SELECT statement the other two tables wouldn't join(?) multiple rows. I feel like I've written tons of queries like this in SQL Server using temp tables and not run into this issue. But I'll definitely be careful going forward. – J King Jul 26 '19 at 19:41