3

I have tried solving this problem for a while now. I have searched a lot but no ones seems to have a similar problem.

Here is what I got:

Table 1 / Schedule
EmployeeID, PStart, PEnd
1, 2016-07-01 08:00:00, 2016-07-01 12:00:00
1, 2016-07-01 13:00:00, 2016-07-01 17:00:00

Table 2 / Bookings
EmployeeID, PStart, PEnd
1, 2016-07-01 08:00:00, 2016-07-01 08:40:00
1, 2016-07-01 09:00:00, 2016-07-01 10:10:00
1, 2016-07-01 10:30:00, 2016-07-01 11:00:00
1, 2016-07-01 13:00:00, 2016-07-01 15:00:00
1, 2016-07-01 15:00:00, 2016-07-01 15:30:00

I want to compare this two tables and get the gaps between schedule and booked time. The time that was not booked.

In this example that would be

Result table / Not booked
EmployeeID, PStart, PEnd
1, 2016-07-01 08:40:00, 2016-07-01 09:00:00
1, 2016-07-01 10:10:00, 2016-07-01 10:30:00
1, 2016-07-01 11:00:00, 2016-07-01 12:00:00
1, 2016-07-01 15:30:00, 2016-07-01 17:00:00

The speed of the query is very important. The employee count is above 150, so there is a lot of rows. The date range of which we are going to use might be limited (get gaps for two weeks for instance) instead of showing all gaps since the beginning of time. But only if that helps the query speed.

Thanks in advance.

Regards, Adam

AdamG
  • 55
  • 5

1 Answers1

2

You can do that with a combination of CTEs and ROW_NUMBER() window function.

The idea:

  • group your bookings by schedule,
  • give each booking in the schedule an increasing row number. This will let you join each booking to the row below.
  • From there you can select the dates and do a comparison of the last booking's end date to the next booking's start date to see if the dates are touching.
  • swap the dates around when selecting to see the gaps

Here's example code to do it using the data above:

;with allRows as
(
    -- give each booking an increasing row
    select ROW_NUMBER() OVER (PARTITION BY scheduleRow ORDER BY scheduleRow, b.PStart, b.PEnd) bookingRow,
        s.EmployeeId,
        s.scheduleRow,
        s.PStart as scheduleStart,
        s.PEnd as scheduleEnd,
        b.PStart as bookingStart,
        b.PEnd as bookingEnd 
    from 
        (
            -- we need to add an id for our schedules (does it exist in your case?)
            select ROW_NUMBER() OVER (ORDER BY PStart, PEnd) scheduleRow, *
            FROM schedule
        ) s
        left join bookings b on -- so we can get schedules without bookings
        s.employeeID = b.employeeID AND
        s.PStart <= b.PEnd AND
        s.PEnd >= b.PStart
)
select
    bookingLeft.EmployeeId,
    ISNULL(bookingLeft.bookingEnd, bookingLeft.scheduleStart) as PStart,
    ISNULL(bookingRight.bookingStart, bookingLeft.scheduleEnd) as PEnd
from allRows bookingLeft
left join allRows bookingRight on
    -- this joins the row to the one BELOW for the schedule
    bookingLeft.scheduleRow = bookingRight.scheduleRow and
    bookingLeft.bookingRow = bookingRight.bookingRow - 1
where
    -- this finds our gaps because the end of the last booking 
    -- doesn't touch the start of the next booking.
    ISNULL(bookingLeft.bookingEnd, bookingLeft.scheduleStart) < ISNULL(bookingRight.bookingStart, bookingLeft.scheduleEnd)

EDIT: This doesn't cater for when the booking is at the end of a schedule. because different fields will be selected, the go-to way of sorting that out is to use a UNION. So this can be tacked onto the main script to consider bookings that are at the end.

union all -- a bit quicker than a straight UNION
select
    bookingLeft.EmployeeId,
    ISNULL(bookingRight.bookingEnd, bookingLeft.scheduleStart),
    bookingLeft.bookingStart
from allRows bookingLeft    
left join allRows bookingRight on
    -- this joins the row to the one ABOVE for the schedule
    bookingLeft.scheduleRow = bookingRight.scheduleRow and
    bookingLeft.bookingRow = bookingRight.bookingRow + 1
where
    ISNULL(bookingRight.bookingEnd, bookingLeft.scheduleStart) < bookingLeft.bookingStart and
    bookingLeft.bookingEnd >= bookingLeft.scheduleEnd -- special case when booking at the end of a schedule

If you were using SQL 2012 and above, you could use LEAD() and that would eliminate the allRows CTE making the query much smaller.

As for performance, you can write the allRows to a temp table, slap on an index, and then query from that - it should be really quick.

Balah
  • 2,530
  • 2
  • 16
  • 24
  • This script is really perfect, it did almost 99% of the job. I have another issue that I forgot to mention. Sometimes a schedule starts at 8 to 10 and then another schedule starts at 10 to 12. I wonder if It's possible to join them before running the script – AdamG Jul 31 '16 at 22:08
  • I needed to change two things in the original script 1 it didnt show schedules with no bookings at all 2 It didn't quite work when the last booking was at the end of schedule (I would appreciate if you could offer a alternative script if you have time, as my fix made the script slower, but it's not important) – AdamG Jul 31 '16 at 22:11
  • I've amended the script to show schedules with no bookings (a left join and an extra `ISNULL` check) and have shown how to get gaps when the booking is at the end of a schedule. it uses a union all – Balah Aug 01 '16 at 13:19
  • Is there a way to solve the problem that some schedules are adjacent/next to each other? Im referring to my first comment. When one schedule is between 8 and 10 and a new schedule between 10 and 12 then it will say the gap ends at 10 and a new gap starts at 10. Is it possible to either join the adjacent schedules before run or join the resulting adjacent gaps after run – AdamG Aug 11 '16 at 00:22
  • 1
    There is a way, although it will require the use of recursive CTEs to join each schedule to the adjacent one until no more schedules are adjacent. That's beyond the scope of this question but http://stackoverflow.com/questions/14274942/sql-server-cte-and-recursion-example is a good start. If you run into issues you can open another question. – Balah Aug 11 '16 at 11:40