1

I have the following dataset:

enter image description here

Here is script for this data:

;with dataset AS ( select 'EMP01' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-07' AS DATE) AS CUT_DATE UNION select 'EMP01' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-15' AS DATE) AS CUT_DATE UNION select 'EMP02' AS EMP_ID,CAST('2018-01-01' AS DATE) AS PERIOD_START,CAST('2018-01-31' AS DATE) AS PERIOD_END,CAST('2018-01-09' AS DATE) AS CUT_DATE ) select * from dataset I need to divide these periods (PERIOD_START and PERIOD_END) by CUT_DATE (exclude cut dates from that periods) The number of cut dates could be any (3,5,8 etc).

Expecting result for the dataset above is:

Cœur
  • 37,241
  • 25
  • 195
  • 267
  • So, what have you tried? Where does it go wrong? – Damien_The_Unbeliever Jun 07 '17 at 13:56
  • Pretty well covered by google...https://stackoverflow.com/questions/20402089/detect-consecutive-dates-ranges-using-sql – Simon Jun 07 '17 at 13:58
  • 2
    This is known as the `gaps-and-islands` problem. There are several solutions filed under that keyword. – Stavr00 Jun 07 '17 at 13:59
  • The best write up of Gaps and Islands I have seen is in [SQL Server Deep Dives - Chapter 5 - Gaps and islands (Itzik Ben-Gan)](https://manning-content.s3.amazonaws.com/download/3/e589652-7171-4310-a714-e84dd0f14090/SampleChapter5.pdf) – GarethD Jun 07 '17 at 14:04
  • Guys, Many thanks for your help. Will try to use your comments – David Freeman Jun 07 '17 at 14:18
  • It's interesting that you group 2017-05-26 and 2017-05-29, they are over a weekend, however you do not have entries for the 27th and 28th (sat and sun). That will make it harder to create the groups. This may also help https://stackoverflow.com/questions/1110998/get-day-of-week-in-sql-2005-2008 – Simon Jun 07 '17 at 14:21
  • This is the target. If employee was absent on Friday and Monday, expected date range should include weekend dates. Thank you for your comment. – David Freeman Jun 07 '17 at 14:28
  • Can't answer anymore because it was marked as duplicate. Since you don't have Lead and Lag you can use Cursors (god forbid) and store your previous values. Have a look at this query which gives the results you want. http://rextester.com/SVW51190 – Dan P Jun 07 '17 at 19:12

3 Answers3

2

If your version of SQL Server supports LAG, you can use this.

SELECT EMPLOYEE_ID,
       ITEM_TYPE,
       MIN(APPLY_DATE) AS STARTDATE,
       MAX(APPLY_DATE) AS ENDDATE
FROM
 (SELECT T.*,
         SUM(CASE WHEN PREV_TYPE=ITEM_TYPE THEN 0 ELSE 1 END) 
         OVER(PARTITION BY EMPLOYEE_ID ORDER BY APPLY_DATE) AS GRP
  FROM (SELECT D.*,
        LAG(ITEM_TYPE) OVER(PARTITION BY EMPLOYEE_ID ORDER BY APPLY_DATE) AS PREV_TYPE
        FROM DATA D
       ) T
  ) T
WHERE ITEM_TYPE IN ('Sickness','Vacation')
GROUP BY EMPLOYEE_ID,ITEM_TYPE,GRP

The logic is to get the previous row's item_type (based on ascending order of apply_date) and compare it with the current row's value. If they are equal, they belong to the same group. Else you start a new group. This is done in the sum window function. After groups are assigned, you just need to get the max and min date for an employee_id,item_type.

Sample Demo

Vamsi Prabhala
  • 48,685
  • 4
  • 36
  • 58
0

You would use the LAG function. If you order by something, the LAG function gives the previous value;

a full description can be found at: http://www.sqlservercentral.com/articles/T-SQL/106783/

Take a look at vkp's answer for a full query

realbart
  • 3,497
  • 1
  • 25
  • 37
0

This is another way if way if lag is supported.

Rextester Sample

with tbl as
(select d.*
    ,case when (item_type = lag(item_type) over (partition by employee_id order by apply_date))
            then 0 
            else 1 
    end grp_tmp
from DATA2 d
where 
    item_type <> 'Worked'
)
,tbl2 as 
(select t.*
    ,sum(grp_tmp) over (order by employee_id,apply_date 
                        rows between unbounded preceding and current row
                        ) 
    as grp
from tbl t
)
select 
EMPLOYEE_ID
    ,ITEM_TYPE 
    ,(CONVERT(VARCHAR(24),min(apply_date),103)
    +' - '
    +CONVERT(VARCHAR(24),max(apply_date),103)
    ) as range
from tbl2
group by EMPLOYEE_ID, 
        ITEM_TYPE
        ,grp
order by 
    employee_id
    ,min(apply_date);

Output

+-------------+-----------+-------------------------+
| EMPLOYEE_ID | ITEM_TYPE |          range          |
+-------------+-----------+-------------------------+
|           1 | Sickness  | 23/05/2017 - 24/05/2017 |
|           1 | Vacation  | 26/05/2017 - 29/05/2017 |
|           1 | Sickness  | 01/06/2017 - 01/06/2017 |
|           2 | Sickness  | 25/05/2017 - 30/05/2017 |
+-------------+-----------+-------------------------+
Utsav
  • 7,914
  • 2
  • 17
  • 38