0

I have this query running on a Postgres 12 server with WHERE datetime BETWEEN datetimeA and datetimeB. Time difference between datetimeA and datetimeB is always 30 minutes, number of data involved is not much different, but the execution time for differs wildly:

  1. WHERE servertime between '2022-12-01 04:00:00' and '2022-12-01 04:30:00' (5 seconds)
  2. WHERE servertime between '2022-12-29 04:00:00' and '2022-12-29 04:30:00' (4 minutes)

Execution time for beginning of the month date is always fast and it get so much slower as the date near the end of the month.

Below is the full query:

select p2.id as positionid, p2.stationid, p.servertime, p.servertime_max, p.geofenceid 
, p2.latitude, p2.longitude, datanum 
from( 
                select min(p.servertime) as servertime, max(p.servertime) as servertime_max 
                , p.deviceid, count(p.id) as datanum, p.geofenceid                  
                from tc_positions p 
                join tc_devices d on d.id=p.deviceid 
                join tc_stations st on st.id=p.stationid 
                where 
                p.speed <= 50 
                and st.stationcategory=200 
                and servertime between '2022-12-28 04:00:00' and  '2022-12-28 04:30:00'
                group by p.deviceid, p.geofenceid, p.stationid
                    --total data involved 180 rows
) p 
join ( 
                select * from tc_positions where            
                servertime between '2022-12-28 04:00:00' and  '2022-12-28 04:30:00'
                --total data involved 13.000 rows 
) p2 on p2.servertime=p.servertime and p2.deviceid=p.deviceid and p2.geofenceid=p.geofenceid 
where datanum>0;

I stopped all other connections and processes to the DB while testing the queries and running the query on the same date but using different WHERE condition (it doesn't matter WHEN I run the query, if the date range in the condition is small date is always fast despite the fact that the db server is currently handling heavy read/write connection).

Why this behaviour? Probably there are internal Postgres indexing or anything that might make the query run like this. I wish to make the execution time is at least the same for any date as long as the time range is the same.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • 3
    subquery `p` is grouped by `stationid`, but that columns is not in the `SELECT` list. Doesn't seem to make sense? Please show the output of `EXPLAIN (ANALYZE BUFFERS)` for slow and fast query. – Erwin Brandstetter Jan 02 '23 at 07:59

1 Answers1

0

This simpler query should be equivalent and faster:

SELECT DISTINCT ON (p.deviceid, p.geofenceid, p.stationid)
       p.id AS positionid
     , p.geofenceid, p.deviceid, p.stationid
     , p.latitude, p.longitude
     , p.servertime
     , max(p.servertime) OVER w AS servertime_max
     , count(*) OVER w AS datanum
FROM   tc_positions p
JOIN   tc_stations st ON st.id = p.stationid
WHERE  p.speed <= 50
AND    st.stationcategory = 200
AND    p.servertime BETWEEN '2022-12-28 04:00:00' AND '2022-12-28 04:30:00'
WINDOW w AS (PARTITION BY p.deviceid, p.geofenceid, p.stationid)
ORDER  BY p.deviceid, p.geofenceid, p.stationid, p.servertime;

This can be done in a single SELECT because DISTINCT ON is applied after window functions.

About the sequence of events in a SELECT query:

About DISTINCT ON:

I dropped the join to tc_devices completely: seemed to be dead freight (unless that join can eliminate rows).

And I included stationid in the DISTINCT step that replaces your GROUP BY(so you get separate rows for each stationid). Your original query is inconsistent in this regard.

As for why one time range is faster than another: probably different query plans. But that's impossible to tell without execution plan ...

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228