1

I have a large table(117 899 162 rows) the query below is very slow but if I remove EXTRACT(ISODOW FROM l2.starttime) IN (1) the performance is alot better. I guess a index is skipped when adding isodow. Is there any way to improve this query?

SELECT l2.* FROM ListenerActivity l1 JOIN ListenerActivity l2 ON l1.id != l2.id 
WHERE l2.starttime BETWEEN '2022-04-17T14:00:00' AND '2022-04-19T21:00:00' 
AND l1.starttime BETWEEN '2022-04-17T14:00:00' AND '2022-04-19T21:00:00' 
AND l1.memberid = l2.memberid 
AND l1.station != l2.station 
AND l1.station = 928 
AND (l2.starttime - l1.endtime) = INTERVAL '1 second'
AND EXTRACT(ISODOW FROM l2.starttime) IN (1)

Execution plan without isodow:

"Merge Join  (cost=36594.99..36611.22 rows=1 width=47) (actual time=518.095..650.036 rows=1172 loops=1)"
"  Merge Cond: (l1.memberid = l2.memberid)"
"  Join Filter: ((l1.id <> l2.id) AND (l1.station <> l2.station) AND ((l2.starttime - l1.endtime) = '00:00:01'::interval))"
"  Rows Removed by Join Filter: 191168"
"  Buffers: shared hit=74251, temp read=7202 written=4341"
"  ->  Sort  (cost=11946.67..11946.89 rows=433 width=24) (actual time=199.983..200.720 rows=8785 loops=1)"
"        Sort Key: l1.memberid"
"        Sort Method: quicksort  Memory: 1071kB"
"        Buffers: shared hit=3435"
"        ->  Bitmap Heap Scan on listeneractivity l1  (cost=11082.90..11942.88 rows=433 width=24) (actual time=195.014..197.932 rows=8785 loops=1)"
"              Recheck Cond: ((starttime >= '2022-04-17 14:00:00'::timestamp without time zone) AND (starttime <= '2022-04-19 21:00:00'::timestamp without time zone) AND (station = 928))"
"              Heap Blocks: exact=464"
"              Buffers: shared hit=3435"
"              ->  BitmapAnd  (cost=11082.90..11082.90 rows=433 width=0) (actual time=194.946..194.947 rows=0 loops=1)"
"                    Buffers: shared hit=2971"
"                    ->  Bitmap Index Scan on listeneractivity_starttime_idx  (cost=0.00..56.66 rows=14273 width=0) (actual time=14.607..14.607 rows=253370 loops=1)"
"                          Index Cond: ((starttime >= '2022-04-17 14:00:00'::timestamp without time zone) AND (starttime <= '2022-04-19 21:00:00'::timestamp without time zone))"
"                          Buffers: shared hit=294"
"                    ->  Bitmap Index Scan on listeneractivity_station_idx  (cost=0.00..11026.15 rows=3580024 width=0) (actual time=179.723..179.723 rows=3417368 loops=1)"
"                          Index Cond: (station = 928)"
"                          Buffers: shared hit=2677"
"  ->  Sort  (cost=24648.32..24655.45 rows=14273 width=47) (actual time=310.901..378.990 rows=414039 loops=1)"
"        Sort Key: l2.memberid"
"        Sort Method: external sort  Disk: 17360kB"
"        Buffers: shared hit=70816, temp read=5770 written=4341"
"        ->  Index Scan using listeneractivity_starttime_idx on listeneractivity l2  (cost=0.11..24451.34 rows=14273 width=47) (actual time=0.018..112.574 rows=253370 loops=1)"
"              Index Cond: ((starttime >= '2022-04-17 14:00:00'::timestamp without time zone) AND (starttime <= '2022-04-19 21:00:00'::timestamp without time zone))"
"              Buffers: shared hit=70815"
"Planning:"
"  Buffers: shared hit=20"
"Planning Time: 0.230 ms"
"Execution Time: 652.318 ms"

Execution plan with isodow:

"QUERY PLAN"
"Nested Loop  (cost=11060.88..25732.92 rows=1 width=47) (actual time=1011.050..45688.906 rows=175 loops=1)"
"  Join Filter: ((l1.id <> l2.id) AND (l1.station <> l2.station) AND (l1.memberid = l2.memberid) AND ((l2.starttime - l1.endtime) = '00:00:01'::interval))"
"  Rows Removed by Join Filter: 304104051"
"  Buffers: shared hit=50748"
"  ->  Bitmap Heap Scan on listeneractivity l1  (cost=11060.77..11556.41 rows=249 width=24) (actual time=265.118..280.547 rows=6922 loops=1)"
"        Recheck Cond: ((starttime >= '2022-04-18 14:00:00'::timestamp without time zone) AND (starttime <= '2022-04-19 21:00:00'::timestamp without time zone) AND (station = 928))"
"        Heap Blocks: exact=341"
"        Buffers: shared hit=3208"
"        ->  BitmapAnd  (cost=11060.77..11060.77 rows=249 width=0) (actual time=263.834..263.835 rows=0 loops=1)"
"              Buffers: shared hit=2867"
"              ->  Bitmap Index Scan on listeneractivity_starttime_idx  (cost=0.00..34.54 rows=8214 width=0) (actual time=8.036..8.036 rows=163144 loops=1)"
"                    Index Cond: ((starttime >= '2022-04-18 14:00:00'::timestamp without time zone) AND (starttime <= '2022-04-19 21:00:00'::timestamp without time zone))"
"                    Buffers: shared hit=190"
"              ->  Bitmap Index Scan on listeneractivity_station_idx  (cost=0.00..11026.15 rows=3580024 width=0) (actual time=254.729..254.729 rows=3417368 loops=1)"
"                    Index Cond: (station = 928)"
"                    Buffers: shared hit=2677"
"  ->  Materialize  (cost=0.11..14115.28 rows=41 width=47) (actual time=0.000..1.998 rows=43933 loops=6922)"
"        Buffers: shared hit=47540"
"        ->  Index Scan using listeneractivity_starttime_idx on listeneractivity l2  (cost=0.11..14115.24 rows=41 width=47) (actual time=0.028..151.353 rows=43933 loops=1)"
"              Index Cond: ((starttime >= '2022-04-18 14:00:00'::timestamp without time zone) AND (starttime <= '2022-04-19 21:00:00'::timestamp without time zone))"
"              Filter: (date_part('isodow'::text, starttime) = '1'::double precision)"
"              Rows Removed by Filter: 119211"
"              Buffers: shared hit=47540"
"Planning:"
"  Buffers: shared hit=20"
"Planning Time: 0.241 ms"
"Execution Time: 45689.757 ms" 

image of below info

id | integer | 
audiencetype | character varying(255) | 
endtime | timestamp without time zone | 
inoutofhome | character varying(255) | 
starttime | timestamp without time zone | 
memberid | bigint | 
station | integer |
Indexes: 
   "listeneractivity_pkey" PRIMARY KEY, btree (id) 
   "listeneractivity_endtime_idx" btree (endtime) 
   "listeneractivity_starttime_idx" btree (starttime) 
   "listeneractivity_station_idx" btree (station) 
Foreign-key constraints: 
   "fkewxhyebhex19kpytnanu9yq1s" FOREIGN KEY (memberid) REFERENCES member(id) 
   "listeneractivity_fk_1" FOREIGN KEY (memberid) REFERENCES member(id) 
   "listeneractivity_fk_2" FOREIGN KEY (station) REFERENCES local_station(id)

Final analyze after adding new index:

"QUERY PLAN"
"Hash Join  (cost=372.98..1418.71 rows=1 width=47) (actual time=282.822..352.753 rows=363 loops=1)"
"  Output: l2.id, l2.audiencetype, l2.endtime, l2.inoutofhome, l2.starttime, l2.memberid, l2.station"
"  Hash Cond: (l1.memberid = l2.memberid)"
"  Join Filter: ((l1.id <> l2.id) AND (l1.station <> l2.station) AND ((l2.starttime - l1.endtime) = '00:00:01'::interval))"
"  Rows Removed by Join Filter: 48313"
"  Buffers: shared hit=32379"
"  ->  Index Scan using listeneractivity_starttime_station_memberid_id_date_part_idx on public.listeneractivity l1  (cost=0.11..1037.96 rows=432 width=24) (actual time=0.040..50.408 rows=8785 loops=1)"
"        Output: l1.id, l1.audiencetype, l1.endtime, l1.inoutofhome, l1.starttime, l1.memberid, l1.station"
"        Index Cond: ((l1.starttime >= '2022-04-17 14:00:00'::timestamp without time zone) AND (l1.starttime <= '2022-04-19 21:00:00'::timestamp without time zone) AND (l1.station = 928))"
"        Buffers: shared hit=4452"
"  ->  Hash  (cost=372.61..372.61 rows=71 width=47) (actual time=265.327..265.329 rows=97013 loops=1)"
"        Output: l2.id, l2.audiencetype, l2.endtime, l2.inoutofhome, l2.starttime, l2.memberid, l2.station"
"        Buckets: 131072 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 9313kB"
"        Buffers: shared hit=27927"
"        ->  Index Scan using listeneractivity_starttime_station_memberid_id_date_part_idx on public.listeneractivity l2  (cost=0.11..372.61 rows=71 width=47) (actual time=2.344..170.917 rows=97013 loops=1)"
"              Output: l2.id, l2.audiencetype, l2.endtime, l2.inoutofhome, l2.starttime, l2.memberid, l2.station"
"              Index Cond: ((l2.starttime >= '2022-04-17 14:00:00'::timestamp without time zone) AND (l2.starttime <= '2022-04-19 21:00:00'::timestamp without time zone) AND (date_part('isodow'::text, l2.starttime) = '1'::double precision))"
"              Buffers: shared hit=27927"
"Planning:"
"  Buffers: shared hit=20"
"Planning Time: 0.250 ms"
"Execution Time: 352.848 ms"``` 
codehippy
  • 21
  • 3
  • That is correct I have edited the query still the performance without isodow is 1312 ms and with isodow 76772 ms – codehippy Apr 21 '22 at 08:14
  • There is no `isodow` in your query.... – Luuk Apr 21 '22 at 08:15
  • It's in the beginning of my question but I can add it – codehippy Apr 21 '22 at 08:17
  • Ok, I had to decrease the time span on the last query since it timed out otherwise. – codehippy Apr 21 '22 at 08:40
  • I should mention that my postgres is hosted on heroku could it be a performance bottle neck there? – codehippy Apr 21 '22 at 08:41
  • Can you add the output of `\d ListenerActivity` ? – Luuk Apr 21 '22 at 09:06
  • no, i do mean like in this answer: https://stackoverflow.com/a/109334/724039 – Luuk Apr 21 '22 at 09:18
  • Added as an image above, hope that's ok. – codehippy Apr 21 '22 at 09:38
  • Actually it' just some text, so adding it as text would have been better... – Luuk Apr 21 '22 at 09:55
  • added the info to your question .. – Luuk Apr 21 '22 at 10:09
  • To be honest, both queries are slow, just one is worse than the other. Both do a lot of work for nothing: Rows Removed by Join Filter. Get all JOIN and WHERE conditions in a single index and see how that improves performance – Frank Heikens Apr 21 '22 at 12:41
  • Your statistics seem to be grossly wrong. Is your table being ANALYZED enough? Your starttime BETWEEN is being underestimated by nearly 20 fold, and that is a simple estimate which should be easy to get right. The misestimation of EXTRACT ISODOW selectivity is what pushes the planner over the edge, but it shouldn't be standing so close to the edge to start with. – jjanes Apr 21 '22 at 19:21
  • I'm not quit sure I understand what you mean @jjanes? – codehippy Apr 22 '22 at 12:53

2 Answers2

1

Both queries are slow and both remove many rows from the join filters. That means that they can't filter by just using the index.

You have to try some, I would start with this one:

CREATE INDEX ON ListenerActivity (starttime, station, memberid, id, EXTRACT(ISODOW FROM starttime));

But a different order of the columns in the index might work better. Just give it a try.

After you created an index, use EXPLAIN(ANALYZE, VERBOSE, BUFFERS) to get the query plan used. You might have to drop the current indexes to force the database to use the new one. This one is the bigger one and because of that the database might think it's not the best one.

Creating additional statistics for the combination of columns might also help the query planner.

Frank Heikens
  • 117,544
  • 24
  • 142
  • 135
0

Can you try this query?

I moved all the expressions that depend on l2 to the ON-clause.

The expression l2.starttime BETWEEN '2022-04-17T14:00:00' AND '2022-04-19T21:00:00' seems not needed because of the (l2.starttime - l1.endtime) = INTERVAL '1 second'

SELECT 
   l2.* 
FROM ListenerActivity l1 
JOIN ListenerActivity l2 ON l1.id != l2.id 
                        and l2.starttime BETWEEN l1.endtime - '1 second' and l1.endtime + '1 second'
                        and l1.memberid = l2.memberid
                        and l1.station != l2.station
                        and EXTRACT(ISODOW FROM l2.starttime) IN (1)
WHERE  
      l1.starttime BETWEEN '2022-04-17T14:00:00' AND '2022-04-19T21:00:00' 
  AND l1.station = 928 
;

Please share execution time of this query, and if possible edit this answer, and add the execution plan

EDIT:

When you do these 2 steps:

  1. l2.starttime BETWEEN '2022-04-17T14:00:00' AND '2022-04-19T21:00:00'
  2. EXTRACT(ISODOW FROM l2.starttime) IN (1)

Then you basically say the start time should be between April17 and Arpil19, AND it should be a monday....

I think it might be quicker to just question the Monday (in stead of the earlier mentioned 2 steps).

l2.starttime BETWEEN '2022-04-18T00:00:00' AND '2022-04-18T22:59:59'

EDIT2:

Adding this index might help:

CREATE INDEX ON ListenerActivity (station,starttime,memberid);
Luuk
  • 12,245
  • 5
  • 22
  • 33
  • It's just as slow as the one above :-( – codehippy Apr 21 '22 at 08:46
  • added an edit... – Luuk Apr 21 '22 at 09:16
  • You mean that I should filter the dates before creating the query? Still that query takes 59 000 ms. – codehippy Apr 21 '22 at 09:47
  • added edit2, about an index. – Luuk Apr 21 '22 at 10:12
  • It's making it faster but still 14 000 ms which is too slow. – codehippy Apr 21 '22 at 11:49
  • If I try the same query on a smaller dataset (1044764 rows) there isn't any problem. – codehippy Apr 21 '22 at 11:53
  • OK, then I am out of options, the only thing is that you can try if changing the order of the fields in the index has influence. – Luuk Apr 21 '22 at 11:55
  • It might be this problem [Postgres server performance falls off a cliff after a certain number of records](https://stackoverflow.com/questions/11702930/postgres-server-performance-falls-off-a-cliff-after-a-certain-number-of-records), which basically suggest to re-write/re-think your query – Luuk Apr 21 '22 at 12:00
  • @Luuk: That was 10 years ago, performance did improve a lot and a lot of things have changed. The current problem is most likely an index that doesn't match the query. And as you can see, most of the results found, are filtered out. That's a waist of time and resources. – Frank Heikens Apr 21 '22 at 12:45
  • @FrankHeikens:So, then conclusion is still valid re-think the query. As in share table definition (s in question now), and complete description of the problem that is to be solved, because the current query is not performing. – Luuk Apr 21 '22 at 12:46