2

There is a number of hotels with different bed capacities. I need to learn that for any given day, how many beds are occupied in each hotel.

Sample data:

 HOTEL      CHECK-IN     CHECK-OUT
   A       29.05.2010   30.05.2010
   A       28.05.2010   30.05.2010
   A       27.05.2010   29.05.2010
   B       18.08.2010   19.08.2010
   B       16.08.2010   20.08.2010
   B       15.08.2010   17.08.2010

Intermediary Result:

HOTEL      DAY          OCCUPIED_BEDS
  A     27.05.2010           1      
  A     28.05.2010           2
  A     29.05.2010           3
  A     30.05.2010           2
  B     15.08.2010           1
  B     16.08.2010           2
  B     17.08.2010           2
  B     18.08.2010           2
  B     19.08.2010           2
  B     20.08.2010           1

Final result:

 HOTEL     MAX_OCCUPATION  
   A            3
   B            2

A similar question is asked before. I thought getting the list of dates (as Tom Kyte shows) between two dates and calculating each day's capacity with a group by. The problem is my table is relatively big and I wonder if there is a less costly way of accomplishing this task.

Community
  • 1
  • 1
bonsvr
  • 2,262
  • 5
  • 22
  • 33

3 Answers3

2

create a temp table containing the days you are interested in

create table #dates (dat datetime)
insert into #dates (dat) values ('20121116')
insert into #dates (dat) values ('20121115')
insert into #dates (dat) values ('20121114')
insert into #dates (dat) values ('20121113')

Get the intermediate result by joining the bookings with the dates so that one per booking-day is "generated"

SELECT Hotel, d.dat, COUNT(*) from bookings b
INNER JOIN #dates d on d.dat BETWEEN b.checkin AND b.checkout
GROUP BY Hotel, d.dat 

An finally get the Max

SELECT Hotel, Max(OCCUPIED_BEDS) FROM IntermediateResult GROUP BY Hotel
Lukas Winzenried
  • 1,919
  • 1
  • 14
  • 22
2

I don't think there's a better approach than the one you outlined in the question. Create your days table (or generate one on the fly). I personally like to have one lying around, updated once a year.

Someone who understand analytic functions will probably be able to do this without an inner/outer query, but as the inner grouping is a subset of the outer, it doesn't make much difference.

Select
  i.Hotel,
  Max(i.OccupiedBeds)
From (
  Select
    s.Hotel,
    d.DayID,
    Count(*) As OccupiedBeds
  From
    SampleData s
      Inner Join
    Days d
      -- might not need to +1 depending on business rules.
      -- I wouldn't count occupancy on the day I check out, if so get rid of it
      On d.DayID >= s.CheckIn And d.DayID < s.CheckOut + 1 
  Group By
    s.Hotel, 
    d.DayID
  ) i
Group By
  i.Hotel

After a bit of playing I couldn't get an analytic function version to work without an inner query:

If speed really is a problem with this, you could consider maintaining an intermediate table with triggers on main table.

http://sqlfiddle.com/#!4/e58e7/24

Laurence
  • 10,896
  • 1
  • 25
  • 34
  • It is an OLAP environment. Dou you know a practical way to create a days table spanning 6 years, i.e. 01.01.2008-01.12.2013 – bonsvr Nov 16 '12 at 21:51
1

The problem with performance is that the join conditions are not based on equality which makes a hash join impossible. Assuming we have a table hotel_day with hotel-day pairs, I would try something like that:

select ch_in.hotel, ch_in.day,
       (check_in_cnt - check_out_cnt) as occupancy_change
from   ( select d.hotel, d.day, count(s.hotel) as check_in_cnt
         from   hotel_days d,
                sample_data s
         where  s.hotel(+) = d.hotel
           and  s.check_in(+) = d.day
         group  by d.hotel, d.day
       ) ch_in,
       ( select d.hotel, d.day, count(s.hotel) as check_out_cnt
         from   hotel_days d,
                sample_data s
         where  s.hotel(+) = d.hotel
           and  s.check_out(+) = d.day
         group  by d.hotel, d.day
       ) ch_out
where  ch_out.hotel = ch_in.hotel
  and  ch_out.day = ch_in.day

The tradeoff is a double full scan, but I think it would still run faster, and it may be parallelized. (I assume that sample_data is big mostly due to the number of bookings, not the number of hotels itself.) The output is a change of occupancy in particular hotels on particular days, but this may be easily summed up into total values with either analytical functions or (probably more efficiently) a PL/SQL procedure with bulk collect.

Tomasz Żuk
  • 1,288
  • 1
  • 12
  • 14