16

I am trying to select dates that have an anniversary in the next 14 days. How can I select based on dates excluding the year? I have tried something like the following.

SELECT * FROM events
WHERE EXTRACT(month FROM "date") = 3
AND EXTRACT(day FROM "date") < EXTRACT(day FROM "date") + 14

The problem with this is that months wrap.
I would prefer to do something like this, but I don't know how to ignore the year.

SELECT * FROM events
WHERE (date > '2013-03-01' AND date < '2013-04-01')

How can I accomplish this kind of date math in Postgres?

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Andrew Hubbs
  • 9,338
  • 9
  • 48
  • 71

8 Answers8

47

TL/DR: use the "Black magic version" below.


All queries presented in other answers so far operate with conditions that are not sargable: they cannot use an index and have to compute an expression for every single row in the base table to find matching rows. Doesn't matter much with small tables. Matters a lot with big tables.

Given the following simple table:

CREATE TABLE event (
  event_id   serial PRIMARY KEY
, event_date date
);

Query

Version 1. and 2. below can use a simple index of the form:

CREATE INDEX event_event_date_idx ON event(event_date);

But all of the following solutions are even faster without index.

1. Simple version

SELECT *
FROM  (
   SELECT ((current_date + d) - interval '1 year' * y)::date AS event_date
   FROM       generate_series( 0,  14) d
   CROSS JOIN generate_series(13, 113) y
   ) x
JOIN  event USING (event_date);

Subquery x computes all possible dates over a given range of years from a CROSS JOIN of two generate_series() calls. The selection is done with the final simple join.

2. Advanced version

WITH val AS (
   SELECT extract(year FROM age(current_date + 14, min(event_date)))::int AS max_y
        , extract(year FROM age(current_date,      max(event_date)))::int AS min_y
   FROM   event
   )
SELECT e.*
FROM  (
   SELECT ((current_date + d.d) - interval '1 year' * y.y)::date AS event_date
   FROM   generate_series(0, 14) d
        ,(SELECT generate_series(min_y, max_y) AS y FROM val) y
   ) x
JOIN  event e USING (event_date);

Range of years is deduced from the table automatically - thereby minimizing generated years.
You could go one step further and distill a list of existing years if there are gaps.

Effectiveness co-depends on the distribution of dates. It's better for few years with many rows each.

Simple db<>fiddle to play with here
Old sqlfiddle

3. Black magic version

Create a simple SQL function to calculate an integer from the pattern 'MMDD':

CREATE FUNCTION f_mmdd(date) RETURNS int LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
'SELECT (EXTRACT(month FROM $1) * 100 + EXTRACT(day FROM $1))::int';

I had to_char(time, 'MMDD') at first, but switched to the above expression which proved fastest in new tests on Postgres 9.6 and 10:

db<>fiddle here

It allows function inlining because EXTRACT(xyz FROM date) is implemented with the IMMUTABLE function date_part(text, date) internally. And it has to be IMMUTABLE to allow its use in the following essential multicolumn expression index:

CREATE INDEX event_mmdd_event_date_idx ON event(f_mmdd(event_date), event_date);

Multicolumn for a number of reasons:
Can help with ORDER BY or with selecting from given years. Read here. At almost no additional cost for the index. A date fits into the 4 bytes that would otherwise be lost to padding due to data alignment. Read here.
Also, since both index columns reference the same table column, no drawback with regard to H.O.T. updates. Read here.

Basic query:

SELECT *
FROM   event e
WHERE  f_mmdd(e.event_date) BETWEEN f_mmdd(current_date)
                            AND     f_mmdd(current_date + 14);

One PL/pgSQL table function to rule them all

Fork to one of two queries to cover the turn of the year:

CREATE OR REPLACE FUNCTION f_anniversary(_the_date date = current_date, _days int = 14)
  RETURNS SETOF event
  LANGUAGE plpgsql AS
$func$
DECLARE
   d  int := f_mmdd($1);
   d1 int := f_mmdd($1 + $2 - 1);  -- fix off-by-1 from upper bound
BEGIN
   IF d1 > d THEN
      RETURN QUERY
      SELECT *
      FROM   event e
      WHERE  f_mmdd(e.event_date) BETWEEN d AND d1
      ORDER  BY f_mmdd(e.event_date), e.event_date;

   ELSE  -- wrap around end of year
      RETURN QUERY
      SELECT *
      FROM   event e
      WHERE  f_mmdd(e.event_date) >= d OR
             f_mmdd(e.event_date) <= d1
      ORDER  BY (f_mmdd(e.event_date) >= d) DESC, f_mmdd(e.event_date), event_date;
      -- chronological across turn of the year
   END IF;
END
$func$;

Call using defaults: 14 days beginning "today":

SELECT * FROM f_anniversary();

Call for 7 days beginning '2014-08-23':

SELECT * FROM f_anniversary(date '2014-08-23', 7);

db<>fiddle here - comparing EXPLAIN ANALYZE

"February 29"

When dealing with anniversaries or "birthdays", you need to define how to deal with the special case "February 29" in leap years.

When testing for ranges of dates, Feb 29 is usually included automatically, even if the current year is not a leap year. The range of days is extended by 1 retroactively when it covers this day.
On the other hand, if the current year is a leap year, and you want to look for 15 days, you may end up getting results for 14 days in leap years if your data is from non-leap years.

Say, Bob is born on the 29th of February:
My query 1. and 2. include February 29 only in leap years. Bob has birthday only every ~ 4 years.
My query 3. includes February 29 in the range. Bob has birthday every year.

There is no magical solution. You have to define what you want for every case.

Test

To substantiate my point I ran an extensive test with all the presented solutions. I adapted each of the queries to the given table and to yield identical results without ORDER BY.

The good news: all of them are correct and yield the same result - except for Gordon's query that had syntax errors, and @wildplasser's query that fails when the year wraps around (easy to fix).

Insert 108000 rows with random dates from the 20th century, which is similar to a table of living people (13 or older).

INSERT INTO  event (event_date)
SELECT '2000-1-1'::date - (random() * 36525)::int
FROM   generate_series (1, 108000);

Delete ~ 8 % to create some dead tuples and make the table more "real life".

DELETE FROM event WHERE random() < 0.08;
ANALYZE event;

My test case had 99289 rows, 4012 hits.

C - Catcall

WITH anniversaries as (
   SELECT event_id, event_date
         ,(event_date + (n || ' years')::interval)::date anniversary
   FROM   event, generate_series(13, 113) n
   )
SELECT event_id, event_date -- count(*)   --
FROM   anniversaries
WHERE  anniversary BETWEEN current_date AND current_date + interval '14' day;

C1 - Catcall's idea rewritten

Aside from minor optimizations, the major difference is to add only the exact amount of years date_trunc('year', age(current_date + 14, event_date)) to get this year's anniversary, which avoids the need for a CTE altogether:

SELECT event_id, event_date
FROM   event
WHERE (event_date + date_trunc('year', age(current_date + 14, event_date)))::date
       BETWEEN current_date AND current_date + 14;

D - Daniel

SELECT *   -- count(*)   -- 
FROM   event
WHERE  extract(month FROM age(current_date + 14, event_date))  = 0
AND    extract(day   FROM age(current_date + 14, event_date)) <= 14;

E1 - Erwin 1

See "1. Simple version" above.

E2 - Erwin 2

See "2. Advanced version" above.

E3 - Erwin 3

See "3. Black magic version" above.

G - Gordon

SELECT * -- count(*)   
FROM  (SELECT *, to_char(event_date, 'MM-DD') AS mmdd FROM event) e
WHERE  to_date(to_char(now(), 'YYYY') || '-'
                 || (CASE WHEN mmdd = '02-29' THEN '02-28' ELSE mmdd END)
              ,'YYYY-MM-DD') BETWEEN date(now()) and date(now()) + 14;

H - a_horse_with_no_name

WITH upcoming as (
   SELECT event_id, event_date
         ,CASE 
            WHEN date_trunc('year', age(event_date)) = age(event_date)
                 THEN current_date
            ELSE cast(event_date + ((extract(year FROM age(event_date)) + 1)
                      * interval '1' year) AS date) 
          END AS next_event
   FROM event
   )
SELECT event_id, event_date
FROM   upcoming
WHERE  next_event - current_date  <= 14;

W - wildplasser

CREATE OR REPLACE FUNCTION this_years_birthday(_dut date)
  RETURNS date
  LANGUAGE plpgsql AS
$func$
DECLARE
   ret date;
BEGIN
   ret := date_trunc('year' , current_timestamp)
       + (date_trunc('day'  , _dut)
        - date_trunc('year' , _dut));
   RETURN ret;
END
$func$;

Simplified to return the same as all the others:

SELECT *
FROM   event e
WHERE  this_years_birthday( e.event_date::date )
        BETWEEN current_date
        AND     current_date + '2weeks'::interval;

W1 - wildplasser's query rewritten

The above suffers from a number of inefficient details (beyond the scope of this already sizable post). The rewritten version is much faster:

CREATE OR REPLACE FUNCTION this_years_birthday(_dut INOUT date)
  LANGUAGE sql AS
$func$
SELECT (date_trunc('year', now()) + ($1 - date_trunc('year', $1)))::date
$func$;

SELECT *
FROM   event e
WHERE  this_years_birthday(e.event_date) BETWEEN current_date
                                             AND (current_date + 14);

Test results

I ran this test with a temporary table on PostgreSQL 9.1.7. Results were gathered with EXPLAIN ANALYZE, best of 5.

Results

Without index
C:  Total runtime: 76714.723 ms
C1: Total runtime:   307.987 ms  -- !
D:  Total runtime:   325.549 ms
E1: Total runtime:   253.671 ms  -- !
E2: Total runtime:   484.698 ms  -- min() & max() expensive without index
E3: Total runtime:   213.805 ms  -- !
G:  Total runtime:   984.788 ms
H:  Total runtime:   977.297 ms
W:  Total runtime:  2668.092 ms
W1: Total runtime:   596.849 ms  -- !

With index
E1: Total runtime:    37.939 ms  --!!
E2: Total runtime:    38.097 ms  --!!

With index on expression
E3: Total runtime:    11.837 ms  --!!

All other queries perform the same with or without index because they use non-sargable expressions.

Conclusion

  • So far, @Daniel's query was the fastest.

  • @wildplassers (rewritten) approach performs acceptably, too.

  • @Catcall's version is something like the reverse approach of mine. Performance gets out of hand quickly with bigger tables.
    The rewritten version performs pretty well, though. The expression I use is something like a simpler version of @wildplassser's this_years_birthday() function.

  • My "simple version" is faster even without index, because it needs fewer computations.

  • With index, the "advanced version" is about as fast as the "simple version", because min() and max() become very cheap with an index. Both are substantially faster than the rest which cannot use the index.

  • My "black magic version" is fastest with or without index. And it is very simple to call.
    The updated version (after the benchmark) is a bit faster, yet.

  • With a real life table an index will make even greater difference. More columns make the table bigger, and sequential scan more expensive, while the index size stays the same.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Pretty cool. What exactly is the `generate_series(13, 113)` doing in the "simple version"? Where do those numbers come from? –  Mar 02 '13 at 22:16
  • @a_horse_with_no_name: Thanks! Since my test table has dates from the 20th century, that's the approximated series for `current_date - interval '1y' * y.y)`. The "advanced" version calculates this automatically. – Erwin Brandstetter Mar 02 '13 at 22:32
  • Thank you very much for the very extensive answer Erwin. – Andrew Hubbs Mar 04 '13 at 19:07
  • 7
    @ErwinBrandstetter unbelievable extensive answer. +1 You deserve much more than the +1 though. :) – Taryn Mar 04 '13 at 19:16
7

I believe the following test works in all cases, assuming a column named anniv_date:

select * from events
where extract(month from age(current_date+interval '14 days', anniv_date))=0
  and extract(day from age(current_date+interval '14 days', anniv_date)) <= 14

As an example of how it works when crossing a year (and also a month), let's say an anniversary date is 2009-01-04 and the date at which the test is run is 2012-12-29.

We want to consider any date between 2012-12-29 and 2013-01-12 (14 days)

age('2013-01-12'::date, '2009-01-04'::date) is 4 years 8 days.

extract(month...) from this is 0 and extract(days...) is 8, which is lower than 14 so it matches.

Daniel Vérité
  • 58,074
  • 15
  • 129
  • 156
  • 1
    +1 That was the best answer so far. [SQLlfiddle demo](http://www.sqlfiddle.com/#!12/2847f/4). Also demonstrating that you can just add integer to date. I think I have come up with something even better. – Erwin Brandstetter Mar 02 '13 at 21:42
3

How about this?

select *
from events e
where to_char(e."date", 'MM-DD') between to_char(now(), 'MM-DD') and 
                                         to_char(date(now())+14, 'MM-DD')

You can do the comparison as strings.

To take year ends into account, we'll convert back to dates:

select *
from events e
where to_date(to_char(now(), 'YYYY')||'-'||to_char(e."date", 'MM-DD'), 'YYYY-MM-DD')
           between date(now()) and date(now())+14

You do need to make a slight adjustment for Feb 29. I might suggest:

select *
from (select e.*,
             to_char(e."date", 'MM-DD') as MMDD
      from events
     ) e
where to_date(to_char(now(), 'YYYY')||'-'||(case when MMDD = '02-29' then '02-28' else MMDD), 'YYYY-MM-DD')
           between date(now()) and date(now())+14
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786
  • Didn't realize you could do String comparisons like that. This works most of the time. It fails at the year wrap though. Where your dates are something like this: `SELECT to_char(date('1999-01-01'), 'MM-DD') between to_char(date('2012-12-22'), 'MM-DD') and to_char(date('2013-01-02'), 'MM-DD');` – Andrew Hubbs Mar 02 '13 at 01:23
  • Do I misunderstand or does the revised answer still fail for 1999-01-01? Simply adjusting the anniversary date to the current year would turn 1999-01-01 into 2013-01-01, which is already in the past and so cannot pass the second example's BETWEEN check. – pilcrow Mar 02 '13 at 05:11
  • @pilcrow . . . I think you misunderstand. Something whose date is on Jan 1st doesn't have an anniversary in the next two weeks (as I write this). You would have to wait until the last two weeks of the year. Note the *current* year is put on the event date. For spans that exceed a year, the "+14" will handle that. – Gordon Linoff Mar 02 '13 at 14:14
  • @GordonLinoff, you compute "this calendar year's anniversary date" instead of "the next future anniversary." This year, for example, your query will adjust $Whatever-01-01 to 2013-01-01. If you run your query *on the final day of this year*, your where clause will become: `WHERE '2013-01-01'::date BETWEEN '2013-12-31'::date AND '2014-01-14'::date`. That is, it will fail, even though the anniversary is the very next day. (By the way `to_date()` will adjust bogus leap days for you without need for CASE logic. 2013-02-29 => 2013-03-01, for example.) – pilcrow Mar 02 '13 at 19:45
2

For convenience, I created two functions that yield the (expected or past) birsthday in the current year, and the upcoming birthday.

CREATE OR REPLACE FUNCTION this_years_birthday( _dut DATE) RETURNS DATE AS
$func$

DECLARE
        ret DATE;
BEGIN
        ret =
        date_trunc( 'year' , current_timestamp)
        + (date_trunc( 'day' , _dut)
          - date_trunc( 'year' , _dut)
          )
        ;
        RETURN ret;
END;
$func$ LANGUAGE plpgsql;

CREATE OR REPLACE FUNCTION next_birthday( _dut DATE) RETURNS DATE AS
$func$

DECLARE
        ret DATE;
BEGIN
        ret =
        date_trunc( 'year' , current_timestamp)
        + (date_trunc( 'day' , _dut)
          - date_trunc( 'year' , _dut)
          )
        ;
        IF (ret < date_trunc( 'day' , current_timestamp))
           THEN ret = ret + '1year'::interval; END IF;
        RETURN ret;
END;
$func$ LANGUAGE plpgsql;

      --
      -- call the function
      --
SELECT date_trunc( 'day' , t.topic_date) AS the_date
        , this_years_birthday( t.topic_date::date ) AS the_day
        , next_birthday( t.topic_date::date ) AS next_day
FROM topic t
WHERE this_years_birthday( t.topic_date::date )
        BETWEEN  current_date
        AND  current_date + '2weeks':: interval
        ;

NOTE: the casts are needed because I only had timestamps available.

wildplasser
  • 43,142
  • 8
  • 66
  • 109
  • Sorry, I overlooked your query for my test case. At first glimpse I only saw the helper functions. Not more time right now. I'll add it later. – Erwin Brandstetter Mar 02 '13 at 22:59
  • My guess is, that once you "explode" it (write it out, like a macro) these functions will be comparable to any of the other solutions. If GROUP BY is needed, mine will win (given "macroised/exploded functions) BTW: I don't like speed. I like correctness. – wildplasser Mar 03 '13 at 01:22
  • Tested now. Your original performed poorly, the rewritten version is not so bad. Have a look. – Erwin Brandstetter Mar 03 '13 at 17:47
  • Ah, thanks. I also tried to put it into pure SQL, but failed and/or stopped. Btw: I realised that it had a year-wrap problem, so I added a next_birthday() function. My guess is that pure SQL will probably be merged into the plan, where possible. – wildplasser Mar 03 '13 at 18:56
1

You can generate a virtual table of anniversaries, and select from it.

with anniversaries as (
  select event_date, 
         (event_date + (n || ' years')::interval)::date anniversary
  from events, generate_series(1,10) n
)
select event_date, anniversary
from anniversaries
where anniversary between current_date and current_date + interval '14' day
order by event_date, anniversary

The call to generate_series(1,10) has the effect of generating 10 years of anniversaries for each event_date. I wouldn't use the literal value 10 in production. Instead, I'd either calculate the right number of years to use in a subquery, or I'd use a large literal like 100.

You'll want to adjust the WHERE clause to fit your application.

If you have a performance problem with the virtual table (when you have a lot of rows in "events"), replace the common table expression with a base table having the identical structure. Storing anniversaries in a base table makes their values obvious (especially for, say, Feb 29 anniversaries), and queries on such a table can use an index. Querying an anniversary table of half a million rows using just the SELECT statement above takes 25ms on my desktop.

Mike Sherrill 'Cat Recall'
  • 91,602
  • 17
  • 122
  • 185
  • You might be interested in the revised version of this approach I posted. – Erwin Brandstetter Mar 03 '13 at 20:05
  • Good write-up. You've earned a nap. – Mike Sherrill 'Cat Recall' Mar 04 '13 at 11:56
  • @ErwinBrandstetter: I hit you up on Linkedin yesterday. Also, the WHERE clause I wrote is sargable. – Mike Sherrill 'Cat Recall' Mar 05 '13 at 12:45
  • I am not using Linkedin, yet. So many social networks, ugh. I'll probably have a look soon, since people keep bugging me about it. As for the WHERE clause: `where anniversary between current_date and current_date + interval '14' day` *would* be sargable if `anniversary` was a table column, but here it's based on an expression in the CTE here which *isn't*. At least, I could not get Postgres 9.1.7 to use any indexes and that's what I expected. I will gladly learn from you if you can produce evidence to the contrary. – Erwin Brandstetter Mar 05 '13 at 21:51
  • @ErwinBrandstetter: *That other Erwin Brandstetter LinkedIn suggested I might know is probably thinking, "Who the heck is that?"* I understand *sargable* to mean "an expression that *can* use an index if one happens to be present", not "an expression that is actually using an index". The expression in my WHERE clause *can* use an index, but there isn't an index on a CTE. – Mike Sherrill 'Cat Recall' Mar 05 '13 at 23:53
  • Anybody named "Erwin Brandstetter" should be entitled to a share of fun. :) If I'll ever start LinkeIn, we can do this for real. So ... outsourcing the non-sargable expression into a CTE doesn't buy you a whole lot. – Erwin Brandstetter Mar 06 '13 at 00:14
1

This should handle wrap-arounds at the end of the year as well:

with upcoming as (
  select name, 
         event_date,
         case 
           when date_trunc('year', age(event_date)) = age(event_date) then current_date
           else cast(event_date + ((extract(year from age(event_date)) + 1) * interval '1' year) as date) 
         end as next_event
  from events
)
select name, 
       next_event, 
       next_event - current_date as days_until_next
from upcoming
order by next_event - current_date 

You can filter than on the expression next_event - current_date to apply the "next 14 days"

The case ... is only necessary if you consider events that would be "today" as "upcoming" as well. Otherwise, that can be reduced to the else part of the case statement.

Note that I "renamed" the column "date" to event_date. Mainly because reserved words shouldn't be used as an identifier but also because date is a terrible column name. It doesn't tell you anything about what it stores.

0

I found a way to do it.

SELECT EXTRACT(DAYS FROM age('1999-04-10', '2003-05-12')), 
       EXTRACT(MONTHS FROM age('1999-04-10', '2003-05-12'));
 date_part | date_part 
-----------+-----------
        -2 |        -1

I can then just check that the month is 0 and the days are less than 14.

If you have a more elegant solution, please do post it. I'll leave the question open for a bit.

Andrew Hubbs
  • 9,338
  • 9
  • 48
  • 71
0

I don't work with postgresql so I googled it's date functions and found this: http://www.postgresql.org/docs/current/static/functions-datetime.html

If I read it correctly, looking for events in the next 14 days is as simple as:

 where mydatefield >= current_date
 and mydatefield < current_date + integer '14'

Of course I might not be reading it correctly.

Dan Bracuk
  • 20,699
  • 4
  • 26
  • 43
  • Normally it is this straightforward. I want to do that math without taking into consideration the year portion though so it won't work. – Andrew Hubbs Mar 02 '13 at 01:32
  • If your requirement is as simple as, what's going to happen in the next 14 days, the solution is equally simple. If your requirement is something else, I didn't see it in your question. – Dan Bracuk Mar 02 '13 at 01:55
  • `mydatefield` is not the date of the event. It has the same day and same month as the event's date, but is `N` years in the past, where `N` can be anything. Or it's the first date of an event that occurs every year, if it's clearer. – Daniel Vérité Mar 02 '13 at 13:01