SQL Server - find absence date occurrences

Question

I have the following dataset:

enter image description here

Here is script for this data:

;with dataset AS ( select 'EMP01' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-07' AS DATE) AS CUT_DATE UNION select 'EMP01' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-15' AS DATE) AS CUT_DATE UNION select 'EMP02' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-09' AS DATE) AS CUT_DATE ) select * from dataset I need to divide these periods (PERIOD_START and PERIOD_END) by CUT_DATE (exclude cut dates from that periods) The number of cut dates could be any (3,5,8 etc).

Expecting result for the dataset above is:

Pretty well covered by google...https://stackoverflow.com/questions/20402089/detect-consecutive-dates-ranges-using-sql — Simon, Jun 07 '17 at 13:58
This is known as the `gaps-and-islands` problem. There are several solutions filed under that keyword. — Stavr00, Jun 07 '17 at 13:59
The best write up of Gaps and Islands I have seen is in [SQL Server Deep Dives - Chapter 5 - Gaps and islands (Itzik Ben-Gan)](https://manning-content.s3.amazonaws.com/download/3/e589652-7171-4310-a714-e84dd0f14090/SampleChapter5.pdf) — GarethD, Jun 07 '17 at 14:04
Guys, Many thanks for your help. Will try to use your comments — David Freeman, Jun 07 '17 at 14:18
It's interesting that you group 2017-05-26 and 2017-05-29, they are over a weekend, however you do not have entries for the 27th and 28th (sat and sun). That will make it harder to create the groups. This may also help https://stackoverflow.com/questions/1110998/get-day-of-week-in-sql-2005-2008 — Simon, Jun 07 '17 at 14:21
This is the target. If employee was absent on Friday and Monday, expected date range should include weekend dates. Thank you for your comment. — David Freeman, Jun 07 '17 at 14:28
Can't answer anymore because it was marked as duplicate. Since you don't have Lead and Lag you can use Cursors (god forbid) and store your previous values. Have a look at this query which gives the results you want. http://rextester.com/SVW51190 — Dan P, Jun 07 '17 at 19:12

Vamsi Prabhala · Answer 1 · 2017-06-07T14:17:54.557

2

If your version of SQL Server supports LAG, you can use this.

SELECT EMPLOYEE_ID,
       ITEM_TYPE,
       MIN(APPLY_DATE) AS STARTDATE,
       MAX(APPLY_DATE) AS ENDDATE
FROM
 (SELECT T.*,
         SUM(CASE WHEN PREV_TYPE=ITEM_TYPE THEN 0 ELSE 1 END) 
         OVER(PARTITION BY EMPLOYEE_ID ORDER BY APPLY_DATE) AS GRP
  FROM (SELECT D.*,
        LAG(ITEM_TYPE) OVER(PARTITION BY EMPLOYEE_ID ORDER BY APPLY_DATE) AS PREV_TYPE
        FROM DATA D
       ) T
  ) T
WHERE ITEM_TYPE IN ('Sickness','Vacation')
GROUP BY EMPLOYEE_ID,ITEM_TYPE,GRP

The logic is to get the previous row's item_type (based on ascending order of apply_date) and compare it with the current row's value. If they are equal, they belong to the same group. Else you start a new group. This is done in the sum window function. After groups are assigned, you just need to get the max and min date for an employee_id,item_type.

Sample Demo

edited Jun 07 '17 at 14:17

answered Jun 07 '17 at 14:09

Vamsi Prabhala

48,685
4
36
58

Hi vkp, many thanks for your time. Currently, I am working on SQL Server 2008 R2, but soon will get the new version. So, your code will be helpful. Will it produce the expected result? Unfortunately, I am not able to check whther your code works correctly. – David Freeman Jun 07 '17 at 14:16
sure..check the demo shown. – Vamsi Prabhala Jun 07 '17 at 14:18
Amazing job, vkp! Have a nice day :) – David Freeman Jun 07 '17 at 14:29
Works perfectly on SQL Server 2012. – Badar Jun 07 '17 at 14:41
Hi vkp, may I ask you one more question? – David Freeman Jun 08 '17 at 07:55
@DavidFreeman you should post a new question instead. – Vamsi Prabhala Jun 08 '17 at 10:28

realbart · Answer 2 · 2017-06-07T14:32:08.820

0

You would use the LAG function. If you order by something, the LAG function gives the previous value;

a full description can be found at: http://www.sqlservercentral.com/articles/T-SQL/106783/

Take a look at vkp's answer for a full query

edited Jun 07 '17 at 14:32

answered Jun 07 '17 at 14:04

realbart

3,497
1
25
37

LAG is SQL 2012 and up. We don't know what the OP's SQL version. – SS_DBA Jun 07 '17 at 14:05
@realbart .. no point modifying your answer to be exactly mine. – Vamsi Prabhala Jun 07 '17 at 14:29
@vkp You're right. Yours is sufficient. – realbart Jun 07 '17 at 14:32

score 0 · Answer 3 · answered Jun 07 '17 at 14:58

This is another way if way if lag is supported.

Rextester Sample

with tbl as
(select d.*
    ,case when (item_type = lag(item_type) over (partition by employee_id order by apply_date))
            then 0 
            else 1 
    end grp_tmp
from DATA2 d
where 
    item_type <> 'Worked'
)
,tbl2 as 
(select t.*
    ,sum(grp_tmp) over (order by employee_id,apply_date 
                        rows between unbounded preceding and current row
                        ) 
    as grp
from tbl t
)
select 
EMPLOYEE_ID
    ,ITEM_TYPE 
    ,(CONVERT(VARCHAR(24),min(apply_date),103)
    +' - '
    +CONVERT(VARCHAR(24),max(apply_date),103)
    ) as range
from tbl2
group by EMPLOYEE_ID, 
        ITEM_TYPE
        ,grp
order by 
    employee_id
    ,min(apply_date);

Output

+-------------+-----------+-------------------------+
| EMPLOYEE_ID | ITEM_TYPE |          range          |
+-------------+-----------+-------------------------+
|           1 | Sickness  | 23/05/2017 - 24/05/2017 |
|           1 | Vacation  | 26/05/2017 - 29/05/2017 |
|           1 | Sickness  | 01/06/2017 - 01/06/2017 |
|           2 | Sickness  | 25/05/2017 - 30/05/2017 |
+-------------+-----------+-------------------------+

SQL Server - find absence date occurrences

3 Answers3