PostgreSQL: How to return rows with respect to a found row (relative results)?

Question

Forgive my example if it does not make sense. I'm going to try with a simplified one to encourage more participation.

Consider a table like the following:

       dt     |    mnth    |  foo
--------------+------------+--------
  2012-12-01  |  December  |
    ...
  2012-08-01  |  August    |
  2012-07-01  |  July      |
  2012-06-01  |  June      |
  2012-05-01  |  May       |
  2012-04-01  |  April     |
  2012-03-01  |  March     |
    ...
  1997-01-01  |  January   |

If you look for the record with dt closest to today w/o going over, what would be the best way to also return the 3 records beforehand and 7 records after?

I decided to try windowing functions:

WITH dates AS (
   select  row_number() over (order by dt desc)
         , dt
         , dt - now()::date as dt_diff
   from    foo
)
, closest_date AS (
   select * from dates
   where dt_diff = ( select max(dt_diff) from dates where dt_diff <= 0 )
)

SELECT * 
FROM   dates
WHERE  row_number - (select row_number from closest_date) >= -3
   AND row_number - (select row_number from closest_date) <=  7 ;

I feel like there must be a better way to return relative records with a window function, but it's been some time since I've looked at them.

Why you don't use `BETWEEN`? So your request would be something like `select * from table where dt between 2012-08-01 and 2011-10-01`. You could also use some date functions in order to compute the start and end easier. — mistapink, Jun 05 '12 at 23:15
@mistapink: I tested with `BETWEEN`, but it didn't return the right result set, perhaps I had the ends reversed or an error elsewhere. I actually didn't expect to have a working example when I posted, so I guess I already exceeded my expectations :) The point of this was to have a query that could be routinely executed. The example may be bad because the actual records aren't just on the first of the month and there may be more or less records per month. — vol7ron, Jun 06 '12 at 00:47

score 3 · Accepted Answer · answered Jun 05 '12 at 23:36

create table foo (dt date);
insert into foo values
('2012-12-01'),
('2012-08-01'),
('2012-07-01'),
('2012-06-01'),
('2012-05-01'),
('2012-04-01'),
('2012-03-01'),
('2012-02-01'),
('2012-01-01'),
('1997-01-01'),
('2012-09-01'),
('2012-10-01'),
('2012-11-01'),
('2013-01-01')
;

select dt
from (
(
    select dt
    from foo
    where dt <= current_date
    order by dt desc
    limit 4
)
union all
(
    select dt
    from foo
    where dt > current_date
    order by dt
    limit 7
)) s
order by dt
;
     dt     
------------
 2012-03-01
 2012-04-01
 2012-05-01
 2012-06-01
 2012-07-01
 2012-08-01
 2012-09-01
 2012-10-01
 2012-11-01
 2012-12-01
 2013-01-01
(11 rows)

I was thinking about doing something similar. I don't know why, but for some reason I don't like using `limit`. Perhaps because it has different syntaxes between RDBMSes and I like to stay mostly platform independent — vol7ron, Jun 06 '12 at 00:50

Erwin Brandstetter · Answer 2 · 2023-03-20T13:17:53.040

You could use the window function lead():

SELECT dt_lead7 AS dt
FROM  (
    SELECT *, lead(dt, 7) OVER (ORDER BY dt) AS dt_lead7
    FROM   foo
    ) d
WHERE  dt <= CURRENT_DATE
ORDER  BY dt DESC
LIMIT  11;

Somewhat shorter, but the UNION ALL version will be faster with a suitable index.

That leaves a corner case where "date closest to today" is within the first 7 rows. You can pad the original data with 7 rows of -infinity to take care of this:

SELECT d.dt_lead7 AS dt
FROM  (
    SELECT *, lead(dt, 7) OVER (ORDER BY dt) AS dt_lead7
    FROM  (
        SELECT '-infinity'::date AS dt FROM generate_series(1,7)
        UNION ALL
        SELECT dt FROM foo
        ) x
    ) d
WHERE  d.dt &lt= now()::date -- same as: WHERE  dt &lt= now()::date¹
ORDER  BY d.dt_lead7 DESC  -- same as: ORDER BY dt DESC¹
LIMIT  11;

I table-qualified the columns in the second query to clarify what happens. See below.
The result will include NULL values if the "date closest to today" is within the last 7 rows of the base table. You can filter those with an additional sub-select if you need to.

¹To address your doubts about output names versus column names in the comments - consider the following quotes from the manual.

Where to use an output column's name:

An output column's name can be used to refer to the column's value in ORDER BY and GROUP BY clauses, but not in the WHERE or HAVING clauses; there you must write out the expression instead.

Bold emphasis mine. WHERE dt <= now()::date references the column d.dt, not the the output column of the same name - thereby working as intended.

Resolving conflicts:

If an ORDER BY expression is a simple name that matches both an output column name and an input column name, ORDER BY will interpret it as the output column name. This is the opposite of the choice that GROUP BY will make in the same situation. This inconsistency is made to be compatible with the SQL standard.

Bold emphasis mine again. ORDER BY dt DESC in the example references the output column's name - as intended. Anyway, either columns would sort the same. The only difference could be with the NULL values of the corner case. But that falls flat, too, because:

the default behavior is NULLS LAST when ASC is specified or implied, and NULLS FIRST when DESC is specified

As the NULL values come after the biggest values, the order is identical either way.

Without `LIMIT`

As per request in comment:

WITH x AS (
    SELECT *
         , row_number() OVER (ORDER BY dt)  AS rn
         , first_value(dt) OVER (ORDER BY (dt > '2011-11-02')
                                         , dt DESC) AS dt_nearest
    FROM   foo
    )
, y AS (
    SELECT rn AS rn_nearest
    FROM   x
    WHERE  dt = dt_nearest
    )
SELECT dt
FROM   x, y
WHERE  rn BETWEEN rn_nearest - 3 AND rn_nearest + 7
ORDER  BY dt;

If performance is important, I would still go with @Clodoaldo's UNION ALL variant. It will be fastest. Database agnostic SQL will only get you so far. Other RDBMS do not have window functions at all (yet), or different function names (like first_val instead of first_value). You might as well replace LIMIT with TOP n (MS SQL) or whatever the local dialect.

The likelihood I'd switch to MySQL is minimal, but I'm sure it'd probably have window functions in the future - at least, if Oracle didn't buy it, it might have. So, I'd be more likely to switch to Oracle, SQL Server, or DB2. SQL Server is sort of a nightmare in its SQL already, but DB2's limit is something like `FETCH FIRST n ROWS ONLY`, which is much different from `TOP n` or `LIMIT n` — vol7ron, Jun 06 '12 at 13:19
The `lead` doesn't work because it does not include the future dates (it needs both sides of the current date). Also, the WHERE statement probably needs to use `dt_lead7` - I don't think the alias can be used there. — vol7ron, Jun 07 '12 at 15:43
@vol7ron: I added a version to cover the corner case and a detailed explanation why it works correctly otherwise. — Erwin Brandstetter, Jun 07 '12 at 20:53

score 1 · Answer 3 · edited Jun 06 '12 at 02:54

1

You could use something like that:

select * from foo 
where dt between now()- interval '7 months' and now()+ interval '3 months'

This and this may help you.

edited Jun 06 '12 at 02:54

Jonathan Hall

75,165
16
143
189

answered Jun 05 '12 at 23:23

mistapink

1,926
1
26
37

Sorry I didn't respond to your comment on the question early enough. I wanted to avoid this because of the irregularity of the records. I knew my example would be bad, but the number and the relativity to a record in an ordered result set was what I was hoping to find. Specifically, I thought that [cume_dist(), lead(), or lag()](http://www.postgresql.org/docs/9.1/static/functions-window.html) would be what I needed. – vol7ron Jun 06 '12 at 00:57
@vol7ron: This answer is just wrong. But `lead()` could work. I posted an answer. – Erwin Brandstetter Jun 06 '12 at 08:50

PostgreSQL: How to return rows with respect to a found row (relative results)?

3 Answers3

Without `LIMIT`

Linked

PostgreSQL: How to return rows with respect to a found row (relative results)?

3 Answers3

Without LIMIT

Linked

Without `LIMIT`