I have a document management system that records all historical events in a history table. I've been asked to be able to provide the oldest doc_id that has a status of 5 for a given client on a given date. The table looks something like this (truncated for simplicity):
doc_history:
id integer
doc_id integer
event_date timestamp
client_id integer
status_id integer
The client_id and status_id columns are the value of the document after the event has occurred. This means that the maximum history event row for a document defined by doc_id will match the same columns in the document table. Limiting events by a specific event date, you can see what the values of the document were at that time. Because these values are not static, I can't just simply search for a specific client_id with a status_id of 5 because the found result might not match the max(id) of the document. Hopefully that makes some sense.
What I've found to work, but is slow, is the following:
select
t.*
from
(select
distinct on (doc_id),
*
from
doc_history
where
event_date <= '2013-02-17 23:59:59'
order by
doc_id, id desc) t
where
t.client_id = 9999 and
t.status_id = 5
limit 1;
Basically, I'm getting the maximum ID for a particular document ID before the given maximum event date, and then verifying that that maximum history item is assigned to the given client, with the status set to 5.
The drawback to doing it my way is that I'm scanning all history records for all clients to get their maximums, and then finding what I'm looking for for one client and status. As of right now, this scans roughly 15.06 million rows, and takes about 90 seconds on my dev server (which is not blazing fast).
To make matters more complicated I need to do this for each day of the previous week, or seven times total each run. Additionally, all documents in the system start with a status of 5, which represents new. This makes it so that this query would just return the first document entered for that client:
select * from doc_history where client_id = 9999 and
status_id = 5 and
event_date <= '2013-02-17 23:59:59'
order by id limit 1;
What I'm hoping to do is scan until I find a maximum history record for a specific document that matches the specific client and status values without having to first find the maximum ids for all document ids for all clients first. I don't know if this can be done with a windowing function (partition by) or some other logic that I'm currently not seeing.
An example of one of the events in the doc_history table:
# select id, doc_id, event, old_value, new_value, event_date, client_id, status_id from doc_history where doc_id = 9999999 order by id;
id | doc_id | event | old_value | new_value | event_date | client_id | status_id
----------+---------+-------+-----------+-----------+----------------------------+-----------+-----------
25362415 | 9999999 | 13 | | | 2013-02-14 11:49:50.032824 | 9999 | 5
25428192 | 9999999 | 15 | | | 2013-02-18 11:15:48.272542 | 9999 | 5
25428193 | 9999999 | 7 | 5 | 1 | 2013-02-18 11:15:48.301377 | 9999 | 1
Event 7 is status changed, and the old and new values show that it was changed from 5 to 1, which is reflected in the status_id column. For an event_date of less than or equal to 2013-02-17 23:59:59, the above record would have been the oldest "NEW" document with status_id of 5, but after 2/17/2013 it would not have.