2

Imagine we have a beautiful hotel. This hotel has a database with just one table:

room check-in    check_out   other columns...
1    2020-02-04  2020-02-05  ...
1    2020-02-06  2020-02-09  ...
1    2020-04-20  NULL        ...
2    2020-03-29  2020-04-01  ...
2    2020-04-17  2020-04-18  ...

What's the best and efficient way to select the last check-in for every room with other columns' values (otherwise I would just use room, max(check-in)?

Expected result is

room check_in   check_out  other columns... 
1    2020-04-20 NULL       ...
2    2020-04-17 2020-04-18 ...

First idea that came to my mind was to join this table with its copy:

WITH last_checkins AS (
    SELECT room, max(check_in) AS last_c
    FROM rooms
    GROUP BY room
)
SELECT *
FROM rooms r
         INNER JOIN last_chekins c
                    ON r.room = c.room
                        AND r.checkin = c.last_c;

What I dislike about this idea

  • It seems a bit inefficient. There are 30 million rooms in this hotel. So I would have to join two big tables
  • I'm a bit afraid of joining on dates. It feels like something might go wrong. Check_in column has time too by the way. Which makes it messier.

I would like to know are my concerns relevant?

MT0
  • 143,790
  • 11
  • 59
  • 117
Ruslan
  • 393
  • 1
  • 14

1 Answers1

3

The most convenient is probably row_number():

select r.*
from (select r.*,
             row_number() over (partition by room order by checkin dec) as seqnum
      from rooms r
     ) r
where seqnum = 1;

With an index on (room, checkin), this should also have good performance.

Sometimes a correlated subquery works better:

select r.*
from rooms r
where r.checkin = (select max(r2.checkin)
                   from rooms r2
                   where r2.room = r.room
                  );

Oracle has a good optimizer so I am not sure which works better in your situation.

Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • 1
    The analytic function approach is almost certain to be faster - since the avoidance of correlated subqueries like you illustrated (for the specific motivation of improving performance) was one of the main reasons to introduce analytic functions in the first place. –  Apr 20 '20 at 15:26
  • 1
    @mathguy . . . It has to do extra work of assigning a value to all the rows that are filtered out. I do know that in other databases, the correlated subquery is a little bit faster under some circumstances. I don't know about Oracle though. – Gordon Linoff Apr 20 '20 at 15:46
  • 1
    Assigning the extra value is very little overhead. The really big bottleneck (with massive data per room) would be the work to order all the rows in each partition, instead of just finding the max; O(n log n) instead of O(log n). But Oracle is smart, if it sees the filter on `seqnum` in the outer query, it will only compute a max per partition anyway. (And similarly if you had a condition like `seqnum <= 10` for top-10 query.) In any case, you could use max() analytic function for this problem and not rely on that optimization. –  Apr 20 '20 at 15:51