1
+-------+----------------------+----------+------------------+
| isbn  | book_container_id    | shelf_id |   update_time    |
+-------+----------------------+----------+------------------+
|   555 |                    6 | shelf100 | 11/15/2015 19:10 |
|   123 |                    1 | shelf1   | 11/28/2015 8:00  |
|   555 |                    4 | shelf5   | 11/28/2015 9:10  |
|   212 |                    2 | shelf2   | 11/29/2015 8:10  |
|   555 |                    6 | shelf9   | 11/30/2015 22:10 |
|   321 |                    8 | shelf7   | 11/30/2015 8:10  |
|   555 |                    4 | shelf33  | 12/1/2015 7:00   |
+-------+----------------------+----------+------------------+

Let's say I have a table (PostgreSQL) like the above called bookshelf_configuration. If I'm given an ISBN and a timestamp, I want to be able to find the closest (before only) records for each unique combination of isbn and book_container_id.

So if I'm looking at isbn '555', with a timestamp of '12/1/2015 7:00', I should get back:

+-------+----------------------+----------+------------------+
| isbn  | book_container_id    | shelf_id |   update_time    |
+-------+----------------------+----------+------------------+
|   555 |                    6 | shelf9   | 11/30/2015 22:10 |
|   555 |                    4 | shelf33  | 12/1/2015 7:00   |
+-------+----------------------+----------+------------------+

My knowledge of SQL is extremely basic. I've got a query that would work if I only had to factor in isbn, but I need some help understanding how to do this for the combination (isbn, book_container_id).

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
ryoaska
  • 307
  • 5
  • 17

2 Answers2

2

There is a thing called Row_Number that can help you here.

Select * 
From (
    Select *,
           row_number() OVER (partition by isbn, book_container_id order by update_time desc) rn
    From   bookshelf_configuration
    Where  isbn = 555 and update_time <= '12/1/2015 7:00'   
) q 
Where q.rn = 1
JamieD77
  • 13,796
  • 1
  • 17
  • 27
  • Thanks! This did work for me- but I feel like I should select the other answer because the Links include more info and compares with the other method etc. I did want to thank you thoguh- that got me unblocked and able to move forward yesterday :) – ryoaska Dec 04 '15 at 00:04
2

A typical use case for DISTINCT ON:

SELECT DISTINCT ON (book_container_id)
       isbn, book_container_id, shelf_id, update_time 
FROM   bookshelf_configuration
WHERE  isbn = 555
AND    update_time <= '2015-12-01 07:00'  -- ISO 8601 format
ORDER  BY book_container_id, update_time DESC;

Assuming update_time is defined NOT NULL, or you have to add NULLS LAST. Detailed explanation:

Depending on cardinalities and value frequencies there may be even faster query styles:

Either way, a multicolumn index on (isbn, book_container_id, update_time DESC) is the key to make this fast for tables of non-trivial size. Sort order should match the query (or be it's complete inversion). If you add NULLS LAST to the query, add it to the index as well.

Aside: It's better to use ISO 8601 format for all date / time constants, since that is unambiguous with any locale or datestyle setting. Related:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • I tried this out; unfortunately it looks like DISTINCT ON is not supported by Redshift (I should have mentioned that, but I didn't realize it would make a difference). I went with a version of JamieD77's solution, but I'm selecting this answer because of the full explanation and links with info about both solutions, etc. Thanks for the links they helped me understand things much better! Re: ISO 8601 suggestion, we do definitely use ISO 8601 format for date strings- I originally wrote out my example tables in excel, so I think they got messed up there (or just user error...) – ryoaska Dec 04 '15 at 00:39