1

For an analysis I need to aggregate the rows of a single table depending on their creation time. Basically, I want to know the count of orders that have been created within a certain period of time before the current order. Can't seem to find the solution to this.

Table structure:

order_id time_created
1 00:00
2 00:01
3 00:03
4 00:05
5 00:10

Expected result:

order_id count within 3 seconds
1 1
2 2
3 3
4 2
5 1
Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
MikeAndIke
  • 13
  • 4

1 Answers1

0

Sounds like an application for window functions. But, sadly, that's not the case. Window frames can only be based on row counts, not on actual column values.

A simple query with LEFT JOIN can do the job:

SELECT t0.order_id
     , count(t1.time_created) AS count_within_3_sec
FROM   tbl      t0
LEFT   JOIN tbl t1 ON t1.time_created BETWEEN t0.time_created - interval '3 sec'
                                          AND t0.time_created
GROUP  BY 1
ORDER  BY 1;

db<>fiddle here

Does not work with time like in your minimal demo, as that does not wrap around. I suppose it's reasonable to assume timestamp or timestamptz.

Since you include each row itself in the count, an INNER JOIN would work, too. (LEFT JOIN is still more reliable in the face of possible NULL values.)

Or use a LATERAL subquery and you don't need to aggregate on the outer query level:

SELECT t0.order_id
     , t1.count_within_3_sec
FROM   tbl t0
LEFT   JOIN LATERAL (
   SELECT count(*) AS count_within_3_sec
   FROM   tbl t1
   WHERE  t1.time_created BETWEEN t0.time_created - interval '3 sec'
                              AND t0.time_created
   ) t1 ON true
ORDER  BY 1;

Related:

For big tables and many rows in the time frame, a procedural solution that walks through the table once will perform better. Like:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks for the edit and query solution. As you already pointed out, for a big table (which unfortunately I have) a procedural solution seems to be the better approach. I am now trying to implement according to your procedural solution with PL/pgSQL. – MikeAndIke Apr 01 '21 at 13:00
  • Are you sure there is a procedural solution that significantly outperforms the join? I can, of course, iterate over all records and then query the count of records that is within three seconds, but if I am not mistaken, that is just as expensive as the join. I have looked into your procedural solution (from your link) and that is a very elegant solution, but it really only works if every record gets mapped to exactly one row in the result, which is not true in my scenario. Maybe I am overlooking something... – MikeAndIke Apr 06 '21 at 19:44
  • @MikeAndIke: Yes, I am certain. But it's a bit more complex than the example I provided. I investigated and found another related answer (added above). I am actually closing this question as duplicate as the other one answers your question exactly. – Erwin Brandstetter Apr 07 '21 at 02:12