I need to make a query where it returns all requests that are not finished or canceled from the beginning of recordings to a specific date. The way I'm doing right now, take too much time and returns an error: 'User query might have needed to see row versions that must be removed'(my guess it's due of lack of RAM).
Below is the query I'm using, and here are some information:
T1 where each new entry is saved, with an ID, creation date, status(open,closed) and other keys for several tables.
T2 where each change made in each request is saved(in progress, waiting, rejected and closed), date of change and other keys for other tables.
SELECT T1.id_request, T1.dt_created, T1.status FROM T1 LEFT JOIN T2 ON T1.id_request = T2.id_request WHERE (T1.dt_created >= '2012-01-01 00:00:00' AND T1.dt_created <= '2020-05-31 23:59:59') AND T1.id_request NOT IN (SELECT T2.di_request FROM T2 WHERE ((T2.dt_change >= '2012-01-01 00:00:00' AND T2.dt_change <= '2020-05-31 23:59:59') OR T2.dt_change IS NULL) AND T2.status IN ('Closed','Canceled','rejected'))
My thoughts were to get all that is received - T1(I can't just retrieve what is open, it will only work for today, not to a specific past date - what I want) between the beginning of the records and lets say end of May. Then use WHERE T1.ID NOT IN (T2.ID with STATUS 'closed', in the same period). But as I've said it takes forever and returns an error.
I use this same code to get what was open for a specific month(1st to 30rd) and works perfectly fine.
Maybe this approach is not the best approach, but I couldn't think of any other way(I'm not an expert with SQL). If there's not enough information to provide an answer fell free to ask.
As per request from @MikeOrganek here is the analyzer:
Nested Loop Left Join (cost=27985.55..949402.48 rows=227455 width=20) (actual time=2486.433..54832.280 rows=47726 loops=1)
Buffers: shared hit=293242 read=260670
Seq Scan on T1 (cost=27984.99..324236.82 rows=73753 width=20) (actual time=2467.499..6202.970 rows=16992 loops=1)
Filter: ((dt_created >= '2020-05-01 00:00:00-03'::timestamp with time zone) AND (dt_created <= '2020-05-31 23:59:59-03'::timestamp with time zone) AND (NOT (hashed SubPlan 1)))
Rows Removed by Filter: 6085779
Buffers: shared hit=188489 read=250098
SubPlan 1
Nested Loop (cost=7845.36..27983.13 rows=745 width=4) (actual time=129.379..1856.518 rows=168690 loops=1)
Buffers: shared hit=60760
Seq Scan on T3(cost=0.00..5.21 rows=3 width=8) (actual time=0.057..0.104 rows=3 loops=1)
Filter: ((status_request)::text = ANY ('{Closed,Canceled,rejected}'::text[]))
Rows Removed by Filter: 125
Buffers: shared hit=7
Bitmap Heap Scan on T2(cost=7845.36..9321.70 rows=427 width=8) (actual time=477.324..607.171 rows=56230 loops=3)
Recheck Cond: ((dt_change >= '2020-05-01 00:00:00-03'::timestamp with time zone) AND (dt_change <= '2020-05-31 23:59:59-03'::timestamp with time zone) AND (T2.ID_status= T3.ID_status))
Rows Removed by Index Recheck: 87203
Heap Blocks: exact=36359
Buffers: shared hit=60753
BitmapAnd (cost=7845.36..7845.36 rows=427 width=0) (actual time=473.864..473.864 rows=0 loops=3)
Buffers: shared hit=24394
Bitmap Index Scan on idx_ix_T2_dt_change (cost=0.00..941.81 rows=30775 width=0) (actual time=47.380..47.380 rows=306903 loops=3)
Index Cond: ((dt_change >= '2020-05-01 00:00:00-03'::timestamp with time zone) AND (dt_change<= '2020-05-31 23:59:59-03'::timestamp with time zone))
Buffers: shared hit=2523
Bitmap Index Scan on idx_T2_ID_status (cost=0.00..6895.49 rows=262724 width=0) (actual time=418.942..418.942 rows=2105165 loops=3)
Index Cond: (ID_status = T3.ID_status )
Buffers: shared hit=21871
Index Only Scan using idx_ix_T2_id_request on T2 (cost=0.56..8.30 rows=18 width=4) (actual time=0.369..2.859 rows=3 loops=16992)
Index Cond: (id_request = t17.id_request )
Heap Fetches: 44807
Buffers: shared hit=104753 read=10572
Planning time: 23.424 ms
Execution time: 54841.261 ms
And here is the main difference with dt_change IS NULL
:
Planning time: 34.320 ms
Execution time: 230683.865 ms
Thanks