Core requirement:
Find latest entries for a person_id
by submission_date
for specified filter criteria type
, plan
, status
. There could be more such filters, but the logic to return latest by submission date is the same regardless. Two major uses one for paginated viewing in UI and second for generating reports.
WITH cte AS (
SELECT * FROM (
SELECT my_table.*, rank() OVER (PARTITION BY person_id ORDER BY submission_date DESC, last_updated DESC, id DESC) FROM my_table
) rank_filter
WHERE RANK=1 AND status in ('ACCEPTED','CORRECTED') AND type != 'CR' AND h_plan_id IN (10000, 20000)
)
SELECT
SELECT count(id) FROM cte group by id,
SELECT * FROM cte limit 10 offset 0;
The group by
also does not work on a CTE. a union with all null
in the count query might work for combining probably, but not sure.
The main reason I want to combine these two into 1 query is because the table is big and the window function is expensive. Currently I use separate queries which both basically run the same query twice.
Postgres version 12.
\d my_table;
Table "public.my_table"
Column | Type | Collation | Nullable
--------------------------+-----------------------------+-----------+----------
id | bigint | | not null
h_plan_id | bigint | | not null
h_plan_submitter_id | bigint | |
last_updated | timestamp without time zone | |
date_created | timestamp without time zone | |
modified_by | character varying(255) | |
segment_number | integer | |
-- <bunch of other text columns>
submission_date | character varying(255) | |
person_id | character varying(255) | |
status | character varying(255) | |
file_id | bigint | | not null
Indexes:
"my_table_pkey" PRIMARY KEY, btree (id)
"my_table_file_idx" btree (file_id)
"my_table_hplansubmitter_idx" btree (h_plan_submitter_id)
"my_table_key_hash_idx" btree (key_hash)
"my_table_person_id_idx" btree (person_id)
"my_table_segment_number_idx" btree (segment_number)
Foreign-key constraints:
"fk38njesaryvhj7e3p4thqkq7pb" FOREIGN KEY (h_plan_id) REFERENCES health_plan(id) ON UPDATE CASCADE ON DELETE CASCADE
"fk6by9668sowmdob7433mi3rpsu" FOREIGN KEY (h_plan_submitter_id) REFERENCES h_plan_submitter(id) ON UPDATE CASCADE ON DELETE CASCADE
"fkb06gpo9ng6eujkhnes0eco7bj" FOREIGN KEY (file_id) REFERENCES x12file(id) ON UPDATE CASCADE ON DELETE CASCADE
Additional information
Possible values for type
are EN
and CR
with EN
being about 70% of the data.
Table column widths select avg_width from pg_stats where tablename='mytable';
is a total of 374 for 41 columns so about 9 per col.
The idea is to show some pages upfront to user, they can then filter by additional parameters like file_name
(each file usually has about 5k entries), type
(very low cardinality), member_add_id
(high cardinality), plan_id
(low cardinality, every 500k to a million entries will be associated to a plan id). The business requirement in all cases is to show just the latest record for a certain set of plan id's for a submission_date
(for reports it is done per year). The order by id was just defensive coding, the same day can have multiple entries and even if someone edited the second last entry hence touching the last_updated
timestamp, we want to show only the very last entry of the same data. This probably never happens and can be removed.
User can use this data to generate csv reports.
Result of explain for the query with right join below:
Nested Loop Left Join (cost=554076.32..554076.56 rows=10 width=17092) (actual time=4530.914..4530.922 rows=10 loops=1)
CTE cte
-> Unique (cost=519813.11..522319.10 rows=495358 width=1922) (actual time=2719.093..3523.029 rows=422638 loops=1)
-> Sort (cost=519813.11..521066.10 rows=501198 width=1922) (actual time=2719.091..3301.622 rows=423211 loops=1)
Sort Key: mytable.person_id, mytable.submission_date DESC NULLS LAST, mytable.last_updated DESC NULLS LAST, mytable.id DESC
Sort Method: external merge Disk: 152384kB
-> Seq Scan on mytable (cost=0.00..54367.63 rows=501198 width=1922) (actual time=293.953..468.554 rows=423211 loops=1)
Filter: (((status)::text = ANY ('{ACCEPTED,CORRECTED}'::text[])) AND (h_plan_id = ANY ('{1,2}'::bigint[])) AND ((type)::text <> 'CR'::text))
Rows Removed by Filter: 10158
-> Aggregate (cost=11145.56..11145.57 rows=1 width=8) (actual time=4142.116..4142.116 rows=1 loops=1)
-> CTE Scan on cte (cost=0.00..9907.16 rows=495358 width=0) (actual time=2719.095..4071.481 rows=422638 loops=1)
-> Limit (cost=20611.67..20611.69 rows=10 width=17084) (actual time=388.777..388.781 rows=10 loops=1)
-> Sort (cost=20611.67..21850.06 rows=495358 width=17084) (actual time=388.776..388.777 rows=10 loops=1)
Sort Key: cte_1.person_id
Sort Method: top-N heapsort Memory: 30kB
-> CTE Scan on cte cte_1 (cost=0.00..9907.16 rows=495358 width=17084) (actual time=0.013..128.314 rows=422638 loops=1)
Planning Time: 0.369 ms
JIT:
Functions: 9
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.947 ms, Inlining 4.983 ms, Optimization 178.469 ms, Emission 110.261 ms, Total 295.660 ms
Execution Time: 4587.711 ms