2

I have a puzzle which I'm hoping someone can help me solve

I have a table of apartment listings. Each listing has one or more 'unavailable' dates stored individually in another table, let's call it 'off_days'. e.g. a listing which is unavailable from the 1st until the 4th of Sept would have 4 entries in the 'off_days' table, one for each day.

I'm looking for the most efficient way to search (preferably on the database level) for listings with at least N consecutive available days between two calendar days ('available' is any day not in the 'off_days' table for that particular listing). e.g. "Show me all listings with at least 5 consecutive available days in September"

I've been thinking about how i would solve this problem in the real world (by looking at calendar marked with X's and scanning for free blocks) and started thinking about using binary to represent available/unavailable days. i.e. for a given week, 0111001 ( = 57) would tell me that there are at most three consecutive available days in that week.

This question seems like a good start once I have the binary number for a given date range, but now I'm stuck on how to calculate that number dynamically for a given date range, again, on the DB level.... any ideas? or thoughts about this approach or another approach?

Community
  • 1
  • 1
joshblour
  • 1,024
  • 10
  • 19

3 Answers3

2

The apartment is available for gaps in the off days. That means that you want to know how big the gap is for each sequence, and the lag() function can give you this information:

select od.*,
       lag(unavailable) over (partition by apartmentid order by unavailable) as prev_una
from offdays od;

The actual number of days is the difference between the unavailable and the prev one minus 1. Now, assume the two calendar days a v_StartDate and v_EndDate. Now you can basically get what you want as:

select od.*,
       ((case when unavailable is NULL or unavailable > v_EndDate
              then v_EndDate + 1 else unavailable
         end) -
        (case when prev_una is null or prev_una < v_StartDate
              then v_StartDate - 1 else prev_una
         end) - 1
       ) as days_available
from (select od.*, lag(unavailable) over (partition by apartmentid order by unavailable) as prev_una
      from offdays od
     ) od
order by days_available desc;

The case logic is essentially putting in stop dates just before and just after the period.

This isn't quite complete because it has boundary issues: problems when the apartment is not in offdays and problems when the unavailable periods are outside the range. Let's fix this with a union all and some filtering:

select od.*,
       ((case when unavailable is NULL or unavailable > v_EndDate
              then v_EndDate + 1 else unavailable
         end) -
        (case when prev_una is null or prev_una < v_StartDate
              then v_StartDate - 1 else prev_una
         end) - 1
       ) as days_available
from (select od.apartmentId, unavailable,
             lag(unavailable) over (partition by apartmentid order by unavailable) as prev_una
      from offdays od
      where od.unavailable between v_StartDate and v_EndDate
      union all
      select apartmentid, NULL, NULL
      from apartments a
      where not exists (select 1
                        from offdays od
                        where od.apartmentid = a.apartmentid and
                              od.unavailable between v_StartDate and v_EndDate
                       )
     ) od
order by days_available desc;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
0

I'm going out on a limb by suggesting an approach even though I know nothing of Rails. If what I propose makes no sense, please tell me and I will remove my answer and slink away.

Suppose you interrogated the database merely to create an array that indicates whether the apartment is available on each date in a range of consecutive dates. Suppose, for example, it looked like this:

A = true
U = nil
avail = [A,A,A,U,U,A,A,U,U,A,A,A,A,A,U,A]

For a given number of consecutive days, n, the date offsets that begin runs of at least n available days would be given by the following method.

Code

def runs(avail, n)
  avail.each_with_index.each_cons(n).map do |run|
    av, off = run.transpose
    (av == av.compact) ? off.first : nil
  end.compact
end

Examples

runs(avail,1) #=> [0, 1, 2, 5, 6, 9, 10, 11, 12, 13, 15]
runs(avail,2) #=> [0, 1, 5, 9, 10, 11, 12]
runs(avail,3) #=> [0, 9, 10, 11]
runs(avail,4) #=> [9, 10]
runs(avail,5) #=> [9]
runs(avail,6) #=> []

Explanation

Consider the n=3 case above.

n = 3
enum0 = avail.each_with_index
  #=> #<Enumerator: [true, true, true, nil, nil, true, true, nil, nil,
  #                  true, true, true, true, true, nil, true]:each_with_index>
enum0.to_a
  #=> [[true, 0], [true, 1], [true, 2], [nil, 3], [nil, 4], [true, 5],
  #    [true, 6], [nil, 7], [nil, 8], [true, 9], [true, 10], [true, 11],
  #    [true, 12], [true, 13], [nil, 14], [true, 15]]
enum1 = enum0.each_cons(n)
  #=> #<Enumerator: #<Enumerator: [true, true, true, nil,...
  #                           ..., true]:each_with_index>:each_cons(40)>
enum1.to_a
  #=> [[[true, 0], [true, 1], [true, 2]],
  #    [[true, 1], [true, 2], [nil, 3]],
  #    ...
  #    [[true, 13], [nil, 14], [true, 15]]]
enum2 = enum1.map
  #=> #<Enumerator: #<Enumerator: #<Enumerator: [true, true, true, nil,...
  #                           ...true]:each_with_index>:each_cons(3)>:map>
enum2.to_a
  #=> [[[true, 0], [true, 1], [true, 2]],
  #    [[true, 1], [true, 2], [nil, 3]],
  #    ...
  #    [[true, 13], [nil, 14], [true, 15]]]    
a = enum2.each do |run|
  av, off = run.transpose
  (av == av.compact) ? off.first : nil
end
  #=> [0, nil, nil, nil, nil, nil, nil, nil, nil, 9, 10, 11, nil, nil]
a.compact
  #=> [0, 9, 10, 11]

Consider the calculation of the array a above. The first element of enum2 that each passes into the block, and assigns to the block variable run is:

run => [[true, 0], [true, 1], [true, 2]]

Then

av, off = run.transpose #=> [[true, true, true], [0, 1, 2]]
av                      #=> [true, true, true]
off                     #=> [0, 1, 2]
([true, true, true] == [true, true, true].compact) ? 0 : nil
  #=> ([true, true, true] == [true, true, true]) ? 0 : nil
  #=> 0

so the first value of enum2 is mapped into 0, meaning that there is a run of at least 3 days beginning on day offset 0.

Next, [[true, 1], [true, 2], [nil, 3]] is passed into the block and assigned to the variable run. Then:

av, off = run.transpose #=> [[true, true, nil], [1, 2, 3]]
av                      #=> [true, true, nil]
off                     #=> [1, 2, 3]
([true, true, nil] == [true, true, nil].compact) ? 1 : nil
  # ([true, true, nil] == [true, true]) ? 1 : nil
  #=> nil

so the second value of enum2 is mapped into nil, meaning that there is not a run of at least 3 days beginning on day offset 1. And so on...

Notes

  • The values in avail that indicate an apartment is available on a given date offest (the constant A above) can be any "truthy" value (i.e., anything other than false or nil); however, unavailable dates (represented by U above) must be represented by nil, as I use Array#compact to remove them from arrays.
  • It may be helpful to think of enum1 and enum2 as "compound" enumerators.
  • enum0.to_a, enum1.to_a and enum2.to_a are provided to show what each enumerator would pass into its block, if it had one. (enum2 does have a block, of course).
Cary Swoveland
  • 106,649
  • 6
  • 63
  • 100
0

Candidate apartments

Using generate_series() and two nested EXISTS expressions, you can translate this English sentence into SQL pretty much directly:

Find apartments  
  where at least one day exists in a given time range (September)
    where no "off_day" exists in a 5-day range starting that day.
SELECT *
FROM   apt a
WHERE  EXISTS (
   SELECT 1
   FROM   generate_series('2014-09-01'::date
                        , '2014-10-01'::date - 5
                        , interval '1 day') r(day)
   WHERE  NOT EXISTS (
      SELECT 1
      FROM   offday o
      WHERE  o.apt_id = a.apt_id
      AND    o.day BETWEEN r.day::date AND r.day::date + 4
      )
   )
ORDER BY a.apt_id;  -- optional

Free time slots

You can apply a similar query to get an actual list of free slots (the starting day):

SELECT *
FROM   apt a
-- FROM   (SELECT * FROM apt a WHERE apt_id = 1) a  -- for just agiven apt
CROSS  JOIN generate_series('2014-09-01'::date
                          , '2014-10-01'::date - 5
                          , interval '1 day') r(day)
WHERE  NOT EXISTS (
   SELECT 1
   FROM   offday o
   WHERE  o.apt_id = a.apt_id
   AND    o.day BETWEEN r.day::date AND r.day::date + 4
   )
ORDER BY a.apt_id, r.day;

For just the time slots in a given apt.:

SELECT *
FROM  (SELECT * FROM apt a WHERE apt_id = 1) a
...

SQL Fiddle.

Database design

If "off-days" typically consist of multiple days in a row, an alternative table layout based on date ranges instead of single days would be a (much) more efficient alternative.

Range operators can make use of a GiST index on the range column. My first draft of the answer was built on ad-hoc ranges (see answer history), but while working with a "one row per day" design the updated solution is simpler and faster. The alternative layout with adapted queries:

SQL Fiddle with range types.

Related answer with a similar implementation with essential information:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228