0

To start, I've browsed through all similar questions (like this one) but none seem to be quite the same.

I am trying to find the maximum occurrences of an action within a rolling 24 hour period of time. Example data in the table would look like the following:

id  |  datetime      
--------------------------
1   | 2015-03-01 12:01:00
2   | 2015-03-01 12:01:30
3   | 2015-03-01 12:02:42
4   | 2015-03-01 12:05:18
5   | 2015-03-01 12:07:22
6   | 2015-03-02 13:26:59

and so on ...

I want to find the maximum number of items sold within any 24 hour window over a certain interval. I know this SQL won't currently work but gets to the point of what I'm trying to do:

SELECT
    date_trunc('hour',i.dateSold AT TIME ZONE 'America/Chicago') AS endHour,
    SUM(CASE WHEN i.dateSold >= i.dateSold - INTERVAL '24 HOURS' AND i.dateSold < i.dateSold AT TIME ZONE 'America/Chicago' THEN 1 ELSE 0 END) as itemsSold
FROM items_sold i
WHERE
    i.dateSold >= '2015-03-13 00:00:00 America/Chicago'
    AND i.dateSold < '2015-03-16 00:00:00 America/Chicago'
GROUP BY
    date_trunc('hour',i.dateSold AT TIME ZONE 'America/Chicago')

My thought is, I may need to build a CTE with start and end dates but am having trouble coming up with a workable solution. Any ideas?

Community
  • 1
  • 1
JM4
  • 6,740
  • 18
  • 77
  • 125
  • Do I understand right that you like to let the interval always start at 0:00 and you are trying to find the date at which the maximum amount of item was sold? Are you trying to print out selling stats for each day? – Jakob Alexander Eichler Mar 15 '15 at 19:15
  • @tokam the 24 hour period should end with the 00:00:00 hour/min/sec stamp for each period and should begin with the same period 24 hours prior. – JM4 Mar 15 '15 at 19:17
  • What is the scheme of your database's relevant tables and what is the scheme of your output? – Jakob Alexander Eichler Mar 15 '15 at 19:19
  • maybe you can call a function in the selection part of your query which normalizes your time, and than use this output in the group-by clause? – Jakob Alexander Eichler Mar 15 '15 at 19:20
  • For this particular sample, you can assume id = int, dateSold = datetime. – JM4 Mar 15 '15 at 19:22
  • I am reading myself threw this page (while installing pgadmin) ... Does SELECT date_part('day', TIMESTAMP '2001-02-16 20:38:40') * 31 return a valid result? If yes we can somehow build an injective non isomorphic function from the date to a number. – Jakob Alexander Eichler Mar 15 '15 at 19:26
  • You would return the answer "496" – JM4 Mar 15 '15 at 19:32
  • So it computed 16*31. My goal is now to compute an unique integer from the date. Because a month can have 31 never more than 31 days we multiply the month with 31. The year never has more than 365 days and we multply it with 365. But that would lead too an issue. Because our function will not be injective. 31.12.00 were = 403 but 08.01.01 too. I am not sure if teh function is injective if we multiply the year with 13*32 = 416? But for month-factor = 100 and year-factor=10,000 it should work. So compute year-2000 * 10000 + month*100 + day and use this in your group by clause :) – Jakob Alexander Eichler Mar 15 '15 at 19:46
  • You will have a unique number for each day, computed from the datetime. So you can now easily count the sales for each day. Your where clause you can simply do with the date-time statement. We use this trick only to count the sales per day. If you are satisfied with that answer, and I assume there might be a better way, let me know it and I will post it as suggested answer. – Jakob Alexander Eichler Mar 15 '15 at 19:48
  • http://stackoverflow.com/questions/15691127/postgresql-query-to-count-group-by-day-and-display-days-with-no-data – Jakob Alexander Eichler Mar 15 '15 at 19:50
  • 1
    Column names and data types don't add up. You need to post your actual, exact table definition (what you get from `\d tbl` in psql). And, as *always*, your version of Postgres. Also, if you want periods to start and end at midnight, that's not a "rolling 24 hour period", that's just plain old days. Do you mean days as defined by a given time zone? Factor in DST in the usual way? This also depends on your actual table definition once more. What are the timestamps supposed to represent? Local time or absolute time? – Erwin Brandstetter Mar 16 '15 at 00:09

3 Answers3

1

My apologies in advance, as this is a little long shot. It's MySQL syntax (though I believe it's pretty standard SQL) and I havent tested it extensively so I rather think it works and I have no idea of performance, but it should give you a pointer in the right direction.

SELECT 
   sales1.date AS startdate, 
   max(sales2.date) AS enddate,
   count(sales2.id) AS count
FROM sales AS sales1 
   JOIN sales AS sales2 
       ON (sales2.date > sales1.date 
           AND sales2.date <= sales1.date + INTERVAL 1 DAY) 
GROUP BY startdate 
ORDER BY startdate
Fox
  • 2,348
  • 19
  • 19
1

There is no data type called datetime in Postgres. You may be confusing this with MySQL (like some other answers).

You seem to be mixing timestamp and timestamptz data types, which is not going to end well if you don't know what you are doing. Start by reading this:

That's one reason (among others), why you should post your actual and exact table definition with your question.

Judging from your comment (which should be in the question) you want 24h periods to start and end at midnight. That's not a "rolling 24 hour period", that's just plain old days. Only the time zone remains unclear.

Assuming timestamp [without time zone] like your test data suggest and ignoring your undeclared time zone issues. To (I quote):

find the maximum number of items sold within any 24 hour window over a certain interval.

SELECT date_sold::date AS day, count(*) AS item_count
FROM   items_sold
WHERE  date_sold >= '2015-03-13 0:0'  -- timestamp format ...
AND    date_sold <  '2015-03-16 0:0'  -- ... not timestamptz
GROUP  BY 1
ORDER  BY 2 DESC
LIMIT  1;

Depending on your actual table definition and your plans with the time zone, you need to adapt.

Community
  • 1
  • 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
0

If I understand correctly, you can have a record for each item sold. Then, add a record 1 day later saying the record no longer counts. You can then do a cumulative sum of items within 24 hours and choose the maximum by sorting and using limit:

select dateSold, sum(item) over (order by i.dateSold) as numItems
from (select i.dateSold, 1 as item
      from items_sold
      union all
      select i.dateSold + interval '1 day', -1 as item
      from items_sold
     ) i
order by numItems desc
limit 1;
Gordon Linoff
  • 1,242,037
  • 58
  • 646
  • 786