1

I'm looking to aggregate data by the end date of a dataset with some leading period rather than the start. For example, I want to query a table and return the count of matching results 30 days PRIOR to the end date of the date shown in the results. The original table would contain ONLY the date a sale was made (timestamp). Example:

sales_timestamp
------------------
2015-08-05 12:00:00
2015-08-06 13:00:00
2015-08-25 12:31:00
2015-08-26 01:02:00
2015-08-27 02:03:00
2015-08-29 04:23:00
2015-09-01 12:00:00
2015-09-02 12:00:00
2015-09-08 00:00:00

An example of the resulting query output would be:

date_period  |   count_of_sales
--------------------------------
2015-08-24   |        2
2015-08-31   |        6
2015-09-07   |        6

in which the date_period of 2015-09-07 would imply the company sold 6 items in the 30 days ENDING on 9/7/2015 (and starting ~8/7/2015 if a true 30 day period).

I've been toying with variations of the date_trunc() function but can't seem to get the truncation to apply on the end date rather than grouping by the start.

This data would be housed on PostgreSQL 9.1.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
JM4
  • 6,740
  • 18
  • 77
  • 125
  • Sample data -- and a SQL Fiddle -- would really help convey what you are trying to do. For instance, why are your periods one week apart but you want sales for 30 days? – Gordon Linoff Sep 09 '15 at 02:32
  • Your results aren't going to be accurate if you're storing weekly sales numbers. 7 doesn't go into 30 nicely. – domdomcodecode Sep 09 '15 at 02:40
  • @d_ominic the sales numbers aren't stored weekly. They are timestamps of the actual sell itself. – JM4 Sep 09 '15 at 02:50
  • @GordonLinoff - postgresql - I've just never used SQLfiddle before so was only storing some schema data. In the end, its somewhat meaningless as it could just be thought of as a giant DB of timestamps logged when a sale occurs. I simply want to count the number of sales group by and end date with some defined lead period. – JM4 Sep 09 '15 at 02:51
  • So you want check last date call it `D`.... then count in 7 days group until 30 days back from `D`? – Juan Carlos Oropeza Sep 09 '15 at 02:54
  • @JuanCarlosOropeza exactly (for several weeks only based on perhaps a start and end date range) – JM4 Sep 09 '15 at 02:55
  • And you want to do this for every Monday over the period for which you have sales? – Patrick Sep 09 '15 at 02:58
  • Postgres version and table definition *need* to be in the question. – Erwin Brandstetter Sep 09 '15 at 03:05
  • I have updated the q with some more basic sample data as the SQLfiddle was somewhat irrelevant and this may highlight the example dataset more. – JM4 Sep 09 '15 at 03:15

1 Answers1

1

This query does all you ask for:

SELECT day::date AS date_period, count_of_sales
FROM (
   SELECT *, sum(ct) OVER (ORDER BY day ROWS 30 PRECEDING) AS count_of_sales
   FROM   generate_series(date '2015-08-24' - 30  -- start 30 days earlier
                        , date '2015-09-07'
                        , interval '1 day') day
   LEFT JOIN (
      SELECT date_trunc('day', sales_timestamp) AS day, count(*)::int AS ct
      FROM   sales
      GROUP  BY 1
      ) s USING (day)
   ) sub
JOIN  generate_series(date '2015-08-24'
                    , date '2015-09-07 '
                    , interval '1 week') day USING (day);

SQL Fiddle.

Explanation

  1. Generate a full set of relevant days (1st generate_series())
  2. LEFTJOIN to the aggregated counts per day. The LEFT guarantees one row per day, which allows us to use window functions based on the row count.
  3. Use sum() as window aggregate function with a custom frame of 30 days preceding. (You may want to use 29 instead, it's unclear how you count.)

  4. Join the result to actual days you want in the result. (2nd generate_series() with one day per week).

Be aware that the definition of "day" is derived from the current time zone setting of your session if you work with timestamptz. Results can be different in different time zones. Does not apply for just timestamp, which does not depend on the current time zone. Basics:

Related answer with explanation for the window function with custom frame definition:

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • This is awesome. Thank you. I just learned the DB we will be using for this particular project is actually on Vertica but I'll look to see if the generate_series function has a near cousin. – JM4 Sep 09 '15 at 03:54
  • 1
    Not a Vertica user but the `TIMESERIES Clause` https://my.vertica.com/docs/7.1.x/HTML/index.htm#Authoring/SQLReferenceManual/Statements/SELECT/TIMESERIESClause.htm might be useful. You could always build your own calendar table instead of using features such as generate_series, might actually be easier for you in the long run. – Paul Maxwell Sep 09 '15 at 04:56