1

I am teaching myself MS-SQL and I am trying to find different ways to find the Count of Paid and Unpaid Claims for 2012 grouped by Region from these 3 tables. If there is a returned date, the claim is unpaid if the returned date is null then the claim is paid.

I will attach the code I have ran, but I am not sure if there are better ways to do it.

Thanks.

Here is the code:

SET dateformat ymd;

CREATE TABLE Claims
  (
     ClaimID      INT,
     SubID        INT,
     [Claim Date] DATETIME
  );

CREATE TABLE Phoneship
  (
     ClaimID           INT,
     [Shipping Number] INT,
     [Claim Date]      DATETIME,
     [Ship Date]       DATETIME,
     [Returned Date]   DATETIME
  );

CREATE TABLE Enrollment
  (
     SubID           INT,
     Enrollment_Date DATETIME,
     Channel         NVARCHAR(255),
     Region          NVARCHAR(255),
     Status          FLOAT,
     Drop_Date       DATETIME
  );

INSERT INTO [Phoneship]
            ([ClaimID],
             [Shipping Number],
             [Claim Date],
             [Ship Date],
             [Returned Date])
VALUES     (102,
            201,
            '2011-10-13 00:00:00',
            '2011-10-14 00:00:00',
            NULL);

INSERT INTO [Phoneship]
            ([ClaimID],
             [Shipping Number],
             [Claim Date],
             [Ship Date],
             [Returned Date])
VALUES     (103,
            202,
            '2011-11-02 00:00:00',
            '2011-11-03 00:00:00',
            '2011-11-20 00:00:00');

INSERT INTO [Phoneship]
            ([ClaimID],
             [Shipping Number],
             [Claim Date],
             [Ship Date],
             [Returned Date])
VALUES     (103,
            203,
            '2011-11-02 00:00:00',
            '2011-11-22 00:00:00',
            NULL);

INSERT INTO [Phoneship]
            ([ClaimID],
             [Shipping Number],
             [Claim Date],
             [Ship Date],
             [Returned Date])
VALUES     (105,
            204,
            '2012-01-16 00:00:00',
            '2012-01-17 00:00:00',
            NULL);

INSERT INTO [Phoneship]
            ([ClaimID],
             [Shipping Number],
             [Claim Date],
             [Ship Date],
             [Returned Date])
VALUES     (106,
            205,
            '2012-02-15 00:00:00',
            '2012-02-16 00:00:00',
            '2012-02-26 00:00:00');

INSERT INTO [Phoneship]
            ([ClaimID],
             [Shipping Number],
             [Claim Date],
             [Ship Date],
             [Returned Date])
VALUES     (106,
            206,
            '2012-02-15 00:00:00',
            '2012-02-27 00:00:00',
            '2012-03-06 00:00:00');

INSERT INTO [Phoneship]
            ([ClaimID],
             [Shipping Number],
             [Claim Date],
             [Ship Date],
             [Returned Date])
VALUES     (107,
            207,
            '2012-03-12 00:00:00',
            '2012-03-13 00:00:00',
            NULL);

INSERT INTO [Phoneship]
            ([ClaimID],
             [Shipping Number],
             [Claim Date],
             [Ship Date],
             [Returned Date])
VALUES     (108,
            208,
            '2012-05-11 00:00:00',
            '2012-05-12 00:00:00',
            NULL);

INSERT INTO [Phoneship]
            ([ClaimID],
             [Shipping Number],
             [Claim Date],
             [Ship Date],
             [Returned Date])
VALUES     (109,
            209,
            '2012-05-13 00:00:00',
            '2012-05-14 00:00:00',
            '2012-05-28 00:00:00');

INSERT INTO [Phoneship]
            ([ClaimID],
             [Shipping Number],
             [Claim Date],
             [Ship Date],
             [Returned Date])
VALUES     (109,
            210,
            '2012-05-13 00:00:00',
            '2012-05-30 00:00:00',
            NULL);

INSERT INTO [Claims]
            ([ClaimID],
             [SubID],
             [Claim Date])
VALUES     (101,
            12345678,
            '2011-03-06 00:00:00');

INSERT INTO [Claims]
            ([ClaimID],
             [SubID],
             [Claim Date])
VALUES     (102,
            12347190,
            '2011-10-13 00:00:00');

INSERT INTO [Claims]
            ([ClaimID],
             [SubID],
             [Claim Date])
VALUES     (103,
            12348723,
            '2011-11-02 00:00:00');

INSERT INTO [Claims]
            ([ClaimID],
             [SubID],
             [Claim Date])
VALUES     (104,
            12349745,
            '2011-11-09 00:00:00');

INSERT INTO [Claims]
            ([ClaimID],
             [SubID],
             [Claim Date])
VALUES     (105,
            12347190,
            '2012-01-16 00:00:00');

INSERT INTO [Claims]
            ([ClaimID],
             [SubID],
             [Claim Date])
VALUES     (106,
            12349234,
            '2012-02-15 00:00:00');

INSERT INTO [Claims]
            ([ClaimID],
             [SubID],
             [Claim Date])
VALUES     (107,
            12350767,
            '2012-03-12 00:00:00');

INSERT INTO [Claims]
            ([ClaimID],
             [SubID],
             [Claim Date])
VALUES     (108,
            12350256,
            '2012-05-11 00:00:00');

INSERT INTO [Claims]
            ([ClaimID],
             [SubID],
             [Claim Date])
VALUES     (109,
            12347701,
            '2012-05-13 00:00:00');

INSERT INTO [Claims]
            ([ClaimID],
             [SubID],
             [Claim Date])
VALUES     (110,
            12350256,
            '2012-05-15 00:00:00');

INSERT INTO [Claims]
            ([ClaimID],
             [SubID],
             [Claim Date])
VALUES     (111,
            12350767,
            '2012-06-30 00:00:00');

INSERT INTO [Enrollment]
            ([SubID],
             [Enrollment_Date],
             [Channel],
             [Region],
             [Status],
             [Drop_Date])
VALUES     (12345678,
            '2011-01-05 00:00:00',
            'Retail',
            'Southeast',
            1,
            NULL);

INSERT INTO [Enrollment]
            ([SubID],
             [Enrollment_Date],
             [Channel],
             [Region],
             [Status],
             [Drop_Date])
VALUES     (12346178,
            '2011-03-13 00:00:00',
            'Indirect Dealers',
            'West',
            1,
            NULL);

INSERT INTO [Enrollment]
            ([SubID],
             [Enrollment_Date],
             [Channel],
             [Region],
             [Status],
             [Drop_Date])
VALUES     (12346679,
            '2011-05-19 00:00:00',
            'Indirect Dealers',
            'Southeast',
            0,
            '2012-03-15 00:00:00');

INSERT INTO [Enrollment]
            ([SubID],
             [Enrollment_Date],
             [Channel],
             [Region],
             [Status],
             [Drop_Date])
VALUES     (12347190,
            '2011-07-25 00:00:00',
            'Retail',
            'Northeast',
            0,
            '2012-05-21 00:00:00');

INSERT INTO [Enrollment]
            ([SubID],
             [Enrollment_Date],
             [Channel],
             [Region],
             [Status],
             [Drop_Date])
VALUES     (12347701,
            '2011-08-14 00:00:00',
            'Indirect Dealers',
            'West',
            1,
            NULL);

INSERT INTO [Enrollment]
            ([SubID],
             [Enrollment_Date],
             [Channel],
             [Region],
             [Status],
             [Drop_Date])
VALUES     (12348212,
            '2011-09-30 00:00:00',
            'Retail',
            'West',
            1,
            NULL);

INSERT INTO [Enrollment]
            ([SubID],
             [Enrollment_Date],
             [Channel],
             [Region],
             [Status],
             [Drop_Date])
VALUES     (12348723,
            '2011-10-20 00:00:00',
            'Retail',
            'Southeast',
            1,
            NULL);

INSERT INTO [Enrollment]
            ([SubID],
             [Enrollment_Date],
             [Channel],
             [Region],
             [Status],
             [Drop_Date])
VALUES     (12349234,
            '2012-01-06 00:00:00',
            'Indirect Dealers',
            'West',
            0,
            '2012-02-14 00:00:00');

INSERT INTO [Enrollment]
            ([SubID],
             [Enrollment_Date],
             [Channel],
             [Region],
             [Status],
             [Drop_Date])
VALUES     (12349745,
            '2012-01-26 00:00:00',
            'Retail',
            'Northeast',
            0,
            '2012-04-15 00:00:00');

INSERT INTO [Enrollment]
            ([SubID],
             [Enrollment_Date],
             [Channel],
             [Region],
             [Status],
             [Drop_Date])
VALUES     (12350256,
            '2012-02-11 00:00:00',
            'Retail',
            'Southeast',
            1,
            NULL);

INSERT INTO [Enrollment]
            ([SubID],
             [Enrollment_Date],
             [Channel],
             [Region],
             [Status],
             [Drop_Date])
VALUES     (12350767,
            '2012-03-02 00:00:00',
            'Indirect Dealers',
            'West',
            1,
            NULL);

INSERT INTO [Enrollment]
            ([SubID],
             [Enrollment_Date],
             [Channel],
             [Region],
             [Status],
             [Drop_Date])
VALUES     (12351278,
            '2012-04-18 00:00:00',
            'Retail',
            'Midwest',
            1,
            NULL);

INSERT INTO [Enrollment]
            ([SubID],
             [Enrollment_Date],
             [Channel],
             [Region],
             [Status],
             [Drop_Date])
VALUES     (12351789,
            '2012-05-08 00:00:00',
            'Indirect Dealers',
            'West',
            0,
            '2012-07-04 00:00:00');

INSERT INTO [Enrollment]
            ([SubID],
             [Enrollment_Date],
             [Channel],
             [Region],
             [Status],
             [Drop_Date])
VALUES     (12352300,
            '2012-06-24 00:00:00',
            'Retail',
            'Midwest',
            1,
            NULL);

INSERT INTO [Enrollment]
            ([SubID],
             [Enrollment_Date],
             [Channel],
             [Region],
             [Status],
             [Drop_Date])
VALUES     (12352811,
            '2012-06-25 00:00:00',
            'Retail',
            'Southeast',
            1,
            NULL); 

And Query1

SELECT Count(ClaimID)                       AS 'Paid Claim',
       (SELECT Count(ClaimID)
        FROM   dbo.phoneship
        WHERE  [returned date] IS NOT NULL) AS 'Unpaid Claim'
FROM   dbo.Phoneship
WHERE  [Returned Date] IS NULL
GROUP  BY claimid 

Query2

SELECT Count(*)                             AS 'Paid Claims',
       (SELECT Count(*)
        FROM   dbo.Phoneship
        WHERE  [Returned Date] IS NOT NULL) AS 'Unpaid Claims'
FROM   dbo.Phoneship
WHERE  [Returned Date] IS NULL; 

Query3

Select Distinct(C.[Shipping Number]), Count(C.ClaimID) AS 'COUNT ClaimID', 
        A.Region, A.SubID 
From dbo.HSEnrollment A 
Inner Join dbo.Claims B On A.SubId = B.SubId 
Inner Join dbo.Phoneship C On B.ClaimID = C.ClaimID 
Where C.[Returned Date] IS NULL 
Group By A.Region, A.Subid, C.ClaimID, C.[Shipping Number] Order By A.Region
  • You have included your script for creating the database, but not for getting the values that you want. What is your attempted query? – Gordon Linoff Sep 09 '12 at 20:40
  • 1
    Better to edit your question with this than paste queries into the comments (I have done so for you for the first two) – Martin Smith Sep 09 '12 at 20:46
  • 1
    I know this isn't answering your question, but right off the bat I would like to tell you to avoid using spaces in your column names. It can lead to trouble down the line. Some people use underscores, but I prefer Camel Case Notation http://en.wikipedia.org/wiki/CamelCase - so for example [Shipping Number] can be ShippingNumber, it is easier to type when writing queries and your developers (if you have any) won't want to kill you :D – dyslexicanaboko Sep 09 '12 at 20:56
  • Sorry I don't mean to harp again, but just another recommendation - don't use DateTime, use DateTime2 as recommended by MS: http://msdn.microsoft.com/en-us/library/ms187819.aspx and even on StackOverFlow http://stackoverflow.com/questions/1334143/sql-server-datetime2-vs-datetime – dyslexicanaboko Sep 09 '12 at 21:00
  • @dyslexicanaboko Why do CamelCasers capitalize the first letter? Shouldn't theyCapitalizeLikeThis? AndNotLikeThis? – Kermit Sep 10 '12 at 03:49
  • Depends on the context in my opinion. With respect to code, local and instance variables start lower case - properties start with upper case. When I am creating new tables I always capitalize the first letter of every column. I can't say one method is right or wrong, just a preference on my part. Consistency is most important though. Just like the pluralization of table names, either you do it for all or none - don't mix them because it gets confusing when you are trying to recall the names of tables (or columns in this case). – dyslexicanaboko Sep 11 '12 at 19:37

3 Answers3

1

You need to join all the tables together to get the region. This version assumes that there is at most on Phoneship record for each claim:

SELECT e.region, count(*) as numclaims,
       sum(case when ps.ReturnedDate is not null then 1 else 0 end) AS 'Paid Claim',
       sum(case when ps.ReturnedDate is null then 1 else 0 end) AS 'Unpaid Claim'
FROM   claims c join
       enrollment e
       on c.sub_id = e.sub_id left outer join
       Phoneship ps
       on ps.claimid = c.claimdid
WHERE  [Returned Date] IS NULL
GROUP  BY e.region

If there is more than one, then the counts will be off, because each phoneship will be counted instead of each claim. To fix this, change the two sums to:

count(distinct case when ps.ReturnedDate is not null then c.claimid end)
count(distinct case when ps.ReturnedDate is null then c.claimid end)
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
1

It is difficult to answer this question because I see what you are asking but there are a variety of other little problems leading up to your query difficulties.

My Answer
So to answer the core of your question here is what I would do, if and only if I am interpreting your table structure properly (more to follow on that).

I did not include the number of claims because that would throw off the numbers. I have included two queries, the final query and the break down query. For the sum of sums (since you are learning) I used WITH ROLLUP to get the sum of each grouped column.

SELECT 
    e.Region,
    paid = SUM(CASE WHEN p.[Returned Date] IS NULL THEN 1 ELSE 0 END),
    unpaid = SUM(CASE WHEN p.[Returned Date] IS NULL THEN 0 ELSE 1 END)
FROM claims c
    INNER JOIN enrollment e
        ON e.SubID = c.SubID
    INNER JOIN phoneship p
        ON p.ClaimID = c.ClaimID
GROUP BY e.Region
WITH ROLLUP

The Break Down
This is the break down query utilizing a sub select of your inner virtual table (result table - result set etc...). I did this on purpose to demonstrate a point.

SELECT 
    x.Region,
    -- if the Returned Date is null then add 1, otherwise add 0
    paid = SUM(CASE WHEN x.ReturnedDate IS NULL THEN 1 ELSE 0 END),
    -- if the Returned Date is null then add 0, otherwise add 1
    unpaid = SUM(CASE WHEN x.ReturnedDate IS NULL THEN 0 ELSE 1 END)
FROM
(
-- Run this inner query to see what data you are actually working with
-- for your grouping/sums
SELECT 
    c.ClaimID,
    c.SubID,
    E = '#', -- This is just a separator
    e_SubID = e.SubID, -- This is equivalent to saying e.SubID AS e_SubID
    e.Region,
    P = '#', -- This is just a separator
    p_ClaimID = p.ClaimID,
    ShippingNo = p.[Shipping Number], -- Getting rid of those nasty spaces
    ReturnedDate = p.[Returned Date]
FROM claims c
    INNER JOIN enrollment e
        ON e.SubID = c.SubID
    -- Initially this was a LEFT JOIN but you are missing Claims in your 
    -- Phoneship table which will produce bogus results, therefore the
    -- INNER JOIN will filter out any rows that don't match
    INNER JOIN phoneship p
        ON p.ClaimID = c.ClaimID
) as x
GROUP BY x.Region
WITH ROLLUP

As great as Sub Selects are avoid them if you can. They are not very good for performance, but of course there will be times where you can't avoid it.

Your table structure/relationships are the root cause of why you are having difficulty performing this query. After reviewing the structure I see that you are replicating data (which is a no no) and you are having trouble pulling details all into 1 nice query.

The problem areas that I saw (and some friendly advice)

  1. Your replicated the ClaimDate column from the Claims table to the PhoneShip table. I am not sure if those have different meanings, but if it is a duplicate - avoid this.

  2. The SubID that is in the Claims table should probably be removed. It would be better if you put the ClaimID as a Foreign Key (FK) into the Enrollment table.

  3. Give the Phoneship table its own Primary Key (PK) - it is for easy of use, making each row unique aside from the combination of ClaimID and ShippingNumber. Look into Table Relationships and Unique Constraints.

  4. I am a bit iffy about using NULL as a good indicator for whether or not something was paid or unpaid. Only the designer would know that a null field would mean paid. You might be better off using a bit field for this purpose with a default value of zero and marked as NOT NULL - that way it is never null. After all that will save you the trouble of having to write a case statement, you can use the bit directly for your sum Ex: SUM(x.Paid). Also columns can sometimes be mistakenly marked as NULL when not intended for a variety of reasons.

  5. Consider pulling the Channel and Region columns out of the Enrollment table completely. Put those in their own tables with a Integer PK. You can reference the PK everywhere where needed using ChannelID and RegionID. This way if the names need to change, you won't have to worry about data integrity issues (UPDATE Table SET NameCol = 'a' WHERE NameCol = 'b' -- This could cause an unintended renaming disaster.)

  6. Put the RegionID and ChannelID into the Claims table. Now you don't need it in the Enrollment table if you follow step 2 above (have a ClaimID FK in Enrollment table).

Congrats on taking the initiative to learn this stuff. It is invaluable knowledge (unless you go to college, in which case it is worth about 50K or worse... student loans... sigh...).

dyslexicanaboko
  • 4,215
  • 2
  • 37
  • 43
0

Try this

SELECT e.Region, COUNT(c.SubID) TotalClaims, COUNT(p.[returned date]) UnpaidClaims, COUNT(c.SubID)-COUNT(p.[returned date]) PaidClaims
FROM
    Claims c
    INNER JOIN enrollment e ON c.SubID = e.SubID
    INNER JOIN phoneship p ON p.ClaimID = c.ClaimID
GROUP BY e.Region
Nitesh Kumar
  • 1,774
  • 4
  • 19
  • 26