12

Background / Application

I have a MySQL database containing a table of rentable properties and a table of bookings for these properties. There is also a search feature for finding available properties between two provided dates. When searching, the user can enter the start date, the amount of days they wish to stay, and a date flexibility of up to +/- 7 days. A booking can start on the same day as another booking ends (party 1 leaves in the morning, party 2 arrives in the evening).

I am having difficulty implementing the flexibility feature efficiently.

Schema

CREATE TABLE IF NOT EXISTS `property` (
    `id` bigint(20) NOT NULL AUTO_INCREMENT,
    `name` varchar(60) COLLATE utf8_unicode_ci DEFAULT NULL,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE IF NOT EXISTS `property_booking` (
    `id` bigint(20) NOT NULL AUTO_INCREMENT,
    `property_id` bigint(20) DEFAULT NULL,
    `name` varchar(60) COLLATE utf8_unicode_ci DEFAULT NULL,
    `date_start` date DEFAULT NULL,
    `date_end` date DEFAULT NULL,
    PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Sample Data

INSERT INTO `property` (`name`) 
VALUES ('Property 1'), ('Property 2'), ('Property 3');

INSERT INTO `property_booking` (`property_id`,`name`,`date_start`,`date_end`) 
VALUES (1, 'Steve', '2011-03-01', '2011-03-08'), 
(2, 'Bob', '2011-03-13', '2011-03-20'), 
(3, 'Jim', '2011-03-16', '2011-03-23');

Sample Scenario

The user selects that they want to start their stay on 2011-03-10, they want to stay for 7 days, and they have a flexibility of +/- 2 days. I have compiled an image that visualises the data and parameters below. (Red: Booking 1, Green: Booking 2, Stripes: Booking 3, Blue: Date range (2011-03-10, + 7 days and +/- 2 days flexibility))

Expected Result

Property 1 (Bookings available throughout date range)
Property 3 (Bookings available starting on 2011-03-08 or 2011-03-09)

Current Method

My current query checks for overlap for all 7 day date ranges within the total searchable date range, like this:

SELECT p.`id`, p.`name` 
FROM `property` p 
WHERE (NOT (EXISTS (SELECT p2.`name` FROM `property_booking` p2 WHERE (p2.`property_id` = p.`id` AND '2011-03-10' < DATE_SUB(p2.`date_end`, INTERVAL 1 DAY) AND '2011-03-17' > DATE_ADD(p2.`date_start`, INTERVAL 1 DAY))))) 
OR (NOT (EXISTS (SELECT p3.`name` FROM `property_booking` p3 WHERE (p3.`property_id` = p.`id` AND '2011-03-11' < DATE_SUB(p3.`date_end`, INTERVAL 1 DAY) AND '2011-03-18' > DATE_ADD(p3.`date_start`, INTERVAL 1 DAY))))) 
OR (NOT (EXISTS (SELECT p4.`name` FROM `property_booking` p4 WHERE (p4.`property_id` = p.`id` AND '2011-03-09' < DATE_SUB(p4.`date_end`, INTERVAL 1 DAY) AND '2011-03-16' > DATE_ADD(p4.`date_start`, INTERVAL 1 DAY))))) 
OR (NOT (EXISTS (SELECT p5.`name` FROM `property_booking` p5 WHERE (p5.`property_id` = p.`id` AND '2011-03-12' < DATE_SUB(p5.`date_end`, INTERVAL 1 DAY) AND '2011-03-19' > DATE_ADD(p5.`date_start`, INTERVAL 1 DAY)))))
OR (NOT (EXISTS (SELECT p6.`name` FROM `property_booking` p6 WHERE (p6.`property_id` = p.`id` AND '2011-03-08' < DATE_SUB(p6.`date_end`, INTERVAL 1 DAY) AND '2011-03-15' > DATE_ADD(p6.`date_start`, INTERVAL 1 DAY)))));

On the sample dataset, it's reasonably quick, but on much larger datasets it's going to get pretty sluggish, even more so when you build the full +/- 7 day flexibility.

Does anyone have any suggestions as to how this query could be better written?

Kris
  • 1,094
  • 3
  • 13
  • 23
  • If you are free to change the model of your db, this other SO question has a very good answer about this problem http://stackoverflow.com/questions/781221/finding-free-slots-in-a-booking-system – Damp Feb 18 '11 at 14:54
  • 1
    +1 for best question I've ever seen. – WNRosenberg Feb 18 '11 at 16:05

2 Answers2

2

Ok, here's a tricky answer for a tricky question...

SELECT * FROM property AS p
LEFT JOIN  
(
  SELECT property_id, DATEDIFF(MAX(date_end),20110308) AS startblock, 
      DATEDIFF(20110319,MIN(date_start))-1 AS endblock
  FROM property_booking AS pb
  WHERE date_start < 20110319 || date_end >= 20110308 
  GROUP BY property_id
  HAVING LEAST(startblock,endblock) > 4
) AS p2 ON p.id = p2.property_id 
WHERE p2.property_id IS NULL;

The subquery selects all the properties that are not eligible. The LEFT JOIN with IS NULL basically works out the exclusion (negation on the ineligible properties)

  • 20110308 is the desired start date -2 days ( because +/-2 day flexibility)
  • 20110319 is the desired end date +2 days
  • The number 4 in the HAVING LEAST(startblock,endblock) > 4 in twice your +/- number (2*2)

It took me a while to work it out (but your question was interesting and I had time on my hand)

I've tested it with edge cases and it worked for all the test cases I've thrown at it...). The logic behind it is a bit odd but a good old pen and paper helped me work it out!

Edit

Unfortunately I realized that this will work for most cases but not all... (2 single day bookings at the very beginning and end of the lookup period makes a property unavailable even though it should be available).

The problem here is that you have to look up information that's not 'present' in the DB and reconstruct it from what data you have. Check out my comment on your question to see a better way to deal with the problem

Damp
  • 3,328
  • 20
  • 19
  • Interesting solution, thanks! I'll have to mess around with it and test it further, but looks to do the job. The cases that you mention where this won't work isn't going to be an issue since there is a minimum booking period of 7 days. Once I have converted it into DQL (for use with Doctrine) I'll see how it holds up against larger data-sets and get back to you. – Kris Feb 22 '11 at 16:53
0

I think this is what you're looking for:

   SELECT MAX( IF( (    b.date_start < '2011-03-08' + INTERVAL 7 DAY
                    AND b.date_end > '2011-03-08'), 1, 0)) AS is_booked,
          p.id,
          p.name
     FROM property p
LEFT JOIN property_booking b ON p.id = b.property_id
 GROUP BY p.id
   HAVING is_booked < 1

If you want to include the leeway, expand the MAX() aggregate to include the options:

   SELECT MAX( IF(    (    b.date_start < '2011-03-08' + INTERVAL 7 DAY
                       AND b.date_end > '2011-03-08')
                  AND (    b.date_start < '2011-03-08' + INTERVAL 7 DAY + INTERVAL 1 DAY
                       AND b.date_end > '2011-03-08' + INTERVAL 1 DAY)
                  AND (    b.date_start < '2011-03-08' + INTERVAL 7 DAY + INTERVAL 2 DAY
                       AND b.date_end > '2011-03-08' + INTERVAL 2 DAY), 1, 0)
             ) AS is_booked,
          p.id,
          p.name
     FROM property p
LEFT JOIN property_booking b ON p.id = b.property_id
 GROUP BY p.id
   HAVING is_booked < 1

If I'm understanding your question right, this GROUP BY query should cover it more efficiently than multiple subqueries.

Dodger
  • 142
  • 2
  • 10
  • The logic is actually pretty simple. If a booking's date_start is earlier than the desired end date, and a booking's date_end is later than the desired start date, the date ranges overlap. Since the LEFT JOIN includes NULLs where there's no match, the is_booked alias contains whether that overlap occurs, anc because it's a MAX aggregate function, it automatically marks each as booked when some ranges don't overlap and others do. – Dodger Feb 18 '11 at 10:41
  • Thanks for the suggestion - Upon initial testing, it appears to do what I want, however I don't fully understand the usage of the MAX() function here, on my limited data-set, it appears to produce the same results without it. – Kris Feb 22 '11 at 17:04
  • Because of the outer left join, it's possible for left rows to join up with non-matching right rows in a more thorough dataset. Without the MAX(), the value will be the last one encountered, where b.date_start and b.date_end may be null. Also, while MySQL will allow you to use a GROUP BY without an aggregate, most SQLs won't and it's really kind of not proper because it doesn't make sense, and does the same job as a SELECT DISTINCT filtered only by the one column inthe group by clause. – Dodger Feb 23 '11 at 19:47