0

I have a question in Microsoft SQL server 2012. Say there are four columns in a database:

Start_Date, end_date, client, percentage

An example of two records would be

[1/31/14, 2/28/14, client a, 100]
[3/05/14, 5/01/14, client a, 100]

For client a, I want to check that there is a total percentage of 100 on every day between the earliest start_date and the latest end date. In this example, 3/1 - 3/4 do not equal 100. I want the client name to be the output.

Any suggestions?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Sir_cecil
  • 1
  • 1
  • Hi Sir_cecil. Welcome to Stack Overflow! We'd love to hear some more details about what you've *already tried* before coming here. Furthermore, on a more specific note, it would help if you could better-describe what you're hoping to get out of it. You say "...there is a total percentage of 100 on every day," but that doesn't really make sense looking at the data you gave as examples. Could you try, maybe, giving some more examples of different cases? For instance, one where the client name *would* be returned? – Matthew Haugen Jul 15 '14 at 23:48
  • To be a little more specific, essentially the date 3/5/14 is wrong and should be 3/1/14. I am trying to write a script so that it checks that the total equals 100 for client a on every single day. If I take the sum(percentage) for the client, the number would be 200. I would like the output to only be the client name, so that I can later check the database to see which date needs to change. – Sir_cecil Jul 15 '14 at 23:59
  • do you have a "calendar table"? in concept: your own table of each day e.g. http://stackoverflow.com/questions/5635594/how-to-create-a-calender-table-for-100-years-in-sql or http://www.sqlservercentral.com/blogs/dwainsql/2014/03/30/calendar-tables-in-t-sql/ – Paul Maxwell Jul 16 '14 at 00:40

3 Answers3

0

I am going to ignore the percentage, because it doesn't seem relevant (is it ever not 100?). You are just looking for gaps. If you know there are no overlaps in the time periods, you can do this simply as:

select client
from table a
where percentage = 100
group by client
having sum(datediff(day, start_date, end_date)) <> datediff(day, min(start_date), max(end_date));

If there are overlaps, the logic is a little harder, because you have to test for that. However, your question doesn't specify what to do in this case.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Thank you for the script. I also would need to check if there is overlap as well. Say in this example if the start date for record 2 is 2/28/14, the total on 2/28/14 would be 200. I would want to flag that as well. The total must equal 100 on ever single day. – Sir_cecil Jul 16 '14 at 00:27
0

There are many articles around describing how to create you own calendar table, I believe you will need one. Note you haven't told use what type of SQL (MySQL, MS SQL, Oracle, other...) so you will need to locate the appropriate article for you. Or if you don't want a calendar table you could use a recursive CTE to generate a date range instead.

Once you have the dates in a table or CTE, left join your existing table to that like this:

SELECT
     d.client
   , cal.pkdate
   , sum(coalesce(d.percent,0)) as sum_pct
FROM MyCalendatTbl cal
LEFT JOIN MyExistingData d on cal.pkdate between d.start_date and d.end_date
GROUP BY
     d.client
   , cal.pkdate
HAVING
  sum(d.percent) <> 100

this will locate both gaps and overlaps

Paul Maxwell
  • 33,002
  • 3
  • 32
  • 51
0

Using the function LAG it's possible to check for gaps and overlaps in your data without using a calendar table

WITH Gaps AS (
  SELECT Start_Date, end_date, client, percentage
       , Last_EndDate = LAG(end_date, 1, Start_Date) 
                      OVER (PARTITION BY client ORDER BY Start_Date)
  FROM   Table1
)
SELECT DISTINCT client
FROM   Gaps
WHERE  DateDiff(d, Last_EndDate, Start_Date) NOT IN (0, 1)

SQLFiddle Demo

The query assume that the following data is correct

2014-01-31, 2014-02-28, 'client a', 100
2014-03-01, 2014-05-01, 'client a', 100
2014-01-31, 2014-03-01, 'client b', 100
2014-03-01, 2014-05-01, 'client b', 100

client a as the two rows with adjacent end date and start date, client b as the same end date and start date. If only the former is correct the query need to be changed

WITH Gaps AS (
  SELECT Start_Date, end_date, client, percentage
       , Last_EndDate = LAG(end_date, 1, DateAdd(d, -1, Start_Date)) 
                      OVER (PARTITION BY client ORDER BY Start_Date)
  FROM   Table1
)
SELECT DISTINCT client
FROM   Gaps
WHERE  DateDiff(d, Last_EndDate, Start_Date) <> 1

SQLFiddle Demo

Serpiton
  • 3,676
  • 3
  • 24
  • 35