5

I know that to get the closest record prior to a date I can use the query:

select * 
from results 
where resulttime = (select max(resulttime) 
                    from results 
                    where some_id = 15 
                      and resulttime < '2012-07-27');

But I need to do this for a series of days, so that I know the closest record for each day. Any ideas?

The series of days would be generated by generate_sequence().

The closest prior record may be in a prior day to what we want the value for, but still need to be returned.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
root
  • 452
  • 4
  • 8
  • What are your data types - `date` or `timestamp`? Define "closest" exactly. How do you provide the series of days? As dates in a tables, one date per row? – Erwin Brandstetter Jul 26 '12 at 13:41
  • They are of type timestamp. "Closest record prior" is defined as the first record prior to the date given. This does potentially include a record on a previous day. – root Jul 26 '12 at 17:36

2 Answers2

4

Should be simplest & fastest with a LEFT JOIN and DISTINCT ON:

WITH x(search_ts) AS (
    VALUES
     ('2012-07-26 20:31:29'::timestamp)              -- search timestamps
    ,('2012-05-14 19:38:21')
    ,('2012-05-13 22:24:10')
    )
SELECT DISTINCT ON (x.search_ts)
       x.search_ts, r.id, r.resulttime
FROM   x
LEFT   JOIN results r ON r.resulttime <= x.search_ts -- smaller or same
-- WHERE some_id = 15                                -- some condition?
ORDER  BY x.search_ts, r.resulttime DESC;

Result (dummy values):

search_ts           | id     | resulttime
--------------------+--------+----------------
2012-05-13 22:24:10 | 404643 | 2012-05-13 22:24:10
2012-05-14 19:38:21 | 404643 | 2012-05-13 22:24:10
2012-07-26 20:31:29 | 219822 | 2012-07-25 19:47:44

I use a CTE to provide the values, could be a table or function or unnested array or a set generated with generate_series() something else as well. (Did you mean generate_series() by "generate_sequence()"?)

First I JOIN the search timestamps to all rows in the table with earlier or equal resulttime. I use LEFT JOIN instead of JOIN so that search timestamps are not dropped when there is no prior resulttime in the table at all.

With DISTINCT ON (x.search_ts) in combination with ORDER BY x.search_ts, r.resulttime DESC we get the greatest (or one of the equally greatest) resulttime that is smaller or equal to each search timestamp.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
1

A simple subquery to identify the largest date within a day boundary from a time series similar to your example would be something like this

SELECT max(resulttime), date_trunc('days',resulttime) FROM results GROUP BY 2;

that is pinning the timestamp to just the day component using the date_trunc function, and then collecting the largest timestamp for each truncated date, using GROUP BY max()

cms
  • 5,864
  • 2
  • 28
  • 31
  • Thanks, but I believe limiting to just a day will miss the scenario when the closest prior result is on a different day. – root Jul 26 '12 at 17:38