0

I'm trying to assign a unique ID to a case where an employee is out of the office. This could be a single day, a entire week or a collection of weeks.

I have a table called 'absence' that looks like this:

Absence_Date  employee_ID  Hours_Lost  DayofWeek   
2020-09-28    001          7.5         Mon        
2020-04-15    002          7.5         Wed        
2020-04-16    002          7.5         Thurs       
2020-02-10    003          7.5         Mon       
2020-05-12    004          3.5         Tue      
2020-06-08    004          1           Mon         
2020-06-10    004          7.5         Wed         
2020-06-11    004          7.5         Thurs      
2020-06-15    004          7.5         Mon         
2020-06-16    004          7.5         Tue         
2020-06-17    004          7.5         Wed        
2020-06-18    004          7.5         Thurs       
2020-06-22    004          7.5         Mon         

I also have a table called 'working_pattern', with their shift patterns:

employee_ID  DayofWeek  Working_Hours
001          Sun        0
001          Mon        7.5
001          Tue        7.5
001          Wed        7.5
001          Thurs      7.5
001          Fri        7.5
001          Sat        0
002          Sun        0
002          Mon        7.5
002          Tue        7.5
002          Wed        7.5
002          Thurs      7.5
002          Fri        7.5
002          Sat        0
002          Sun        0
003          Mon        7.5
003          Tue        7.5
003          Wed        7.5
003          Thurs      7.5
003          Fri        7.5
003          Sat        0
003          Sun        0
004          Mon        7.5
004          Tue        7.5
004          Wed        7.5
004          Thurs      7.5
004          Fri        0
004          Sat        0

Note that employee_ID 004 doesn't work a Friday

I can join these two tables using:

SELECT a.*, w.working_hours
FROM absence a
    JOIN working_pattern w
        ON a.employee_ID = w.employee_ID
        AND a.DayofWeek = w.DayofWeek

Which gives me:

Absence_Date  employee_ID  Hours_Lost  DayofWeek   Working_Hours
2020-09-28    001          7.5         Mon         7.5            
2020-04-15    002          7.5         Wed         7.5           
2020-04-16    002          7.5         Thurs       7.5         
2020-02-10    003          7.5         Mon         7.5          
2020-05-12    004          3.5         Tue         7.5            
2020-06-08    004          1           Mon         7.5            
2020-06-10    004          7.5         Wed         7.5             
2020-06-11    004          7.5         Thurs       7.5             
2020-06-15    004          7.5         Mon         7.5             
2020-06-16    004          7.5         Tue         7.5            
2020-06-17    004          7.5         Wed         7.5           
2020-06-18    004          7.5         Thurs       7.5            
2020-06-22    004          7.5         Mon         7.5             

I'm trying to create a result that will group continuous instances of absence. Should look like this:

Absence_Date  employee_ID  Hours_Lost  DayofWeek   Working_Hours   Absence_ID
2020-09-28    001          7.5         Mon         7.5             1
2020-04-15    002          7.5         Wed         7.5             2
2020-04-16    002          7.5         Thurs       7.5             2
2020-02-10    003          7.5         Mon         7.5             3
2020-05-12    004          3.5         Tue         7.5             4
2020-06-08    004          1           Mon         7.5             5
2020-06-10    004          7.5         Wed         7.5             6
2020-06-11    004          7.5         Thurs       7.5             6
2020-06-15    004          7.5         Mon         7.5             6
2020-06-16    004          7.5         Tue         7.5             6
2020-06-17    004          7.5         Wed         7.5             6
2020-06-18    004          7.5         Thurs       7.5             6 
2020-06-22    004          7.5         Mon         7.5             6

I added a column using

DATEDIFF(day,Absence_Date,LAG(Absence_Date,1) OVER (ORDER BY t.Employee_Number, Absence_Date))

which gave me the days difference between one row to the next. Thinking I could say all the values of '1' would be consecutive days. However, this doesn't account for weekends or non working days.

Any suggestions?

Thank you!

EDIT

This is different to Detect consecutive dates ranges using SQL as this very creative solution will group instances within a week (or in theory within a month/year), however my instances may group over several weeks and even months.

Phil Collins
  • 327
  • 1
  • 4
  • 13
  • Does this answer your question? [Detect consecutive dates ranges using SQL](https://stackoverflow.com/questions/20402089/detect-consecutive-dates-ranges-using-sql) – Pat. ANDRIA Mar 02 '21 at 10:47
  • This is known as a "gaps and island" problem, there are quite a few examples out there on things similar to this. You state *"I've tried using a LAG/LEAD column, but I can't get it to work! Any suggestions"*, however, we can't suggest why your attempt didn't work if you don't give us said attempt nor explain why it "didn't work" or what "didn't work" means. Did you get an error, unexpected results, something else? – Thom A Mar 02 '21 at 10:48
  • @Pat.ANDRIA that's a very creative solution, however it doesn't work in my case as it treats an continuous instance of absence across multiple weeks as separate instances, where I want it to count as one. Good suggestion though, thank you. – Phil Collins Mar 02 '21 at 10:56
  • @Larnu I added a column using DATEDIFF(day,Absence_Date,LAG(Absence_Date,1) OVER (ORDER BY t.Employee_Number, Absence_Date)), which gave me the days difference between one row to the next. Thinking I could say all the values of '1' would be consecutive days. However, this doesn't account for weekends or non working days. – Phil Collins Mar 02 '21 at 11:12
  • You have enough reptutaion to know there is an [edit] feature, @PhilCollins , and that attempts need to go in the question, not the comments, please. – Thom A Mar 02 '21 at 11:13

1 Answers1

0

A master calendar table with one row for every date over some period (I arbitrarily chose 2020-01-01 to 2021-01-01 based on your example data) allows us create a single query that gives the results you're looking for. The method used to create the master table was convenient for me and not in the scope of the question, so other methods may be preferred. I also left some extra columns in the result so that you can get a better sense of how it works. Without further ado, here's the script:

declare
    @beginDate date ='2020-01-01'
,   @endDate date ='2021-01-01'

create table
    Master_Calendar
(   CalendarDate date primary key
)

insert
    Master_Calendar
(   CalendarDate
)
select
    dateadd(day,  Number - 1, convert (date,'2001-01-01'))
from
    (   select top (10000)
            Number = convert(int, row_number() over (order by o.object_id))
        from
            sys.objects o
            cross join sys.objects o2
            cross join sys.objects o3
    ) numbers
where
    dateadd(day,  Number - 1, convert (date,'2001-01-01')) between @beginDate and @endDate

create table
    absence
(   Absence_Date date
,   employee_ID char(3)
,   Hours_Lost decimal(5, 2)
,   DayOfWeek varchar(5)
)

insert
    dbo.absence
(   Absence_Date
,   employee_ID
,   Hours_Lost
,   DayOfWeek
)
values
  ('2020-09-28','001', 7.5,'Mon')
, ('2020-04-15','002', 7.5,'Wed')
, ('2020-04-16','002', 7.5,'Thurs')
, ('2020-02-10','003', 7.5,'Mon')
, ('2020-05-12','004', 3.5,'Tue')
, ('2020-06-08','004', 1,'Mon')
, ('2020-06-10','004', 7.5,'Wed')
, ('2020-06-11','004', 7.5,'Thurs')
, ('2020-06-15','004', 7.5,'Mon')
, ('2020-06-16','004', 7.5,'Tue')
, ('2020-06-17','004', 7.5,'Wed')
, ('2020-06-18','004', 7.5,'Thurs')
, ('2020-06-22','004', 7.5,'Mon')

create table
    working_pattern
(   employee_ID char(3)
,   DayOfWeek varchar(5)
,   Working_Hours decimal(5, 2)
)

insert
    dbo.working_pattern
(   employee_ID
,   DayOfWeek
,   Working_Hours
)
values
  ('001','Sun', 0)
, ('001','Mon', 7.5)
, ('001','Tue', 7.5)
, ('001','Wed', 7.5)
, ('001','Thurs', 7.5)
, ('001','Fri', 7.5)
, ('001','Sat', 0)
, ('002','Sun', 0)
, ('002','Mon', 7.5)
, ('002','Tue', 7.5)
, ('002','Wed', 7.5)
, ('002','Thurs', 7.5)
, ('002','Fri', 7.5)
, ('002','Sat', 0)
, ('002','Sun', 0)
, ('003','Mon', 7.5)
, ('003','Tue', 7.5)
, ('003','Wed', 7.5)
, ('003','Thurs', 7.5)
, ('003','Fri', 7.5)
, ('003','Sat', 0)
, ('003','Sun', 0)
, ('004','Mon', 7.5)
, ('004','Tue', 7.5)
, ('004','Wed', 7.5)
, ('004','Thurs', 7.5)
, ('004','Fri', 0)
, ('004','Sat', 0)

select
    abs_cal.employee_ID
,   abs_cal.PeriodNumber
,   abs_cal.Absence_Date
,   abs_cal.employee_ID
,   abs_cal.Hours_Lost
,   abs_cal.DayOfWeek
,   abs_cal.PrevAbsencePeriod
,   abs_cal.IsNewAbsence
,   Absence_ID = sum(abs_cal.IsNewAbsence) over (order by abs_cal.employee_ID, abs_cal.Absence_Date)
from
    (   select
            Emp_cal.PeriodNumber
        ,   a.Absence_Date
        ,   a.employee_ID
        ,   a.Hours_Lost
        ,   a.DayOfWeek
        ,   PrevAbsencePeriod = lag(Emp_cal.PeriodNumber, 1) over (partition by a.employee_id order by a.Absence_Date) 
        ,   IsNewAbsence = case when lag(Emp_cal.PeriodNumber, 1) over (partition by a.employee_id order by a.Absence_Date) = Emp_cal.PeriodNumber - 1 then 0 else 1 end
        from
            (   select
                    wp.employee_ID
                ,   CalendarDate
                ,   PeriodNumber = row_number() over (partition by wp.employee_ID order by mc.CalendarDate)
                from
                    dbo.Master_Calendar mc
                    join dbo.working_pattern wp
                        on datename(weekday, CalendarDate) like wp.DayOfWeek +'%'
                        and wp.Working_Hours > 0
            ) Emp_cal
            join dbo.absence a
                on a.employee_ID = Emp_cal.employee_ID
                and a.Absence_Date = Emp_cal.CalendarDate
    ) abs_cal